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1 Introduction to Statistics 


1.1 Introduction 


Statistics is not a new discipline. The origin of statistics had been started with the origin of the 
development of human society. It is as old as the human society as human beings used statistics even 
unknowingly in preliminary phase of development of human civilization. In ancient time, statistics was 

regarded as the science of statecraft and it was used to collect the data of age and sex-wise population as 
well as property and wealth of state by the governments for framing military and fiscal policies. The 
historical evidences such as census of population during the construction of 'Pyramid’ in Egypt by 
Pharaohs, the counting and recording of losses occurred during Napoleonic war in Britain, censuses held 
in England and Germany in the middle ages are regarded as the initiation of development of statistics. But 
nowadays it embraces almost each and every sphere of natural and human activity. 


By the works of French gambler Chevalier de-Mere (Science of Probability), De-Moivre (Normal 
Probability), Gauss (Principle of Least Square and Normal Laws of Errors), Markov (Markov chains), 
Liapounoff (Central Limit Theorem) etc have made the outstanding contributions to modernize statistics. 
| Similarly, Francis Galton and Karl Pearson pioneered the study of regression analysis and correlation 
analysis which has been widely used in various field of modern world. Sir Ronald A. Fisher applied 
statistics to diversified fields such as genetics, biometry, psychology and education, agriculture etc. So R. 
A. Fisher is regarded as the father of statistics. 


1.2 Meaning and Definition of Statistics 


The word statistics have been derived from the Latin word ‘status’ or Italian word 'statista’ or 
German word 'statistik' or French word 'statistique', each of which means a political state. The word 
'statistics' is used in singular as well as plural sense. Thus it is usually defined in two different senses, one 
is singular sense and other is plural sense. In singular sense, it means the statistical methods and 
techniques for dealing numerical data. The acts of dealing data are collection, presentation, analysis and 
the interpretation of the numerical figures. On the other hand, statistics means systematic collection of 
quantitative information of facts (or simply data) in plural sense. . 

(i) In singular sense 

Statistics means the science of statistical methods embodying the theory and techniques used for 

collecting, analyzing and drawing inferences from the numerical data. And it is defined in singular 

senses as follows: 


"Statistics is the science of the measurement of social organism, regarded as a whole in all its 
& 


manifestations". -A.L. Bowley 
"The science and art of handling aggregate of fats, observing, enumeration, recording, classifying 
and otherwise treating them". -Harlow 


"Statistics may be regarded as the science of collection, presentation, analysis and interpretation of 
numerical data". - Croxton and Cowden 


(ii) In Plural Sense 
Statistics is defined in plural sense as follows: 
"Statistics are the classified facts representing the conditions of the people in a state craft. Specially 
those facts which can be stated in number or in tables of numbers or in any tabular or classified 
arrangement". —Webster 
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"Statistics may be defined as the aggregate of facts affected to a marked extent by multiplicity of 
causes, numerically expressed, enumerated or estimated according to reasonable standard or 
accuracy, collected in a systematic manner, for a predetermined purpose and placed in relation to 
each other." - Prof. Horace Secrist 
Hence, statistics is a science which studies the combination of the numerical data for analysis and 
interpretation as well as the methods and principles applied in collecting, presenting, analysis and 
interpreting the data under the study. 


1.3 Division of Statistics 


Nowadays, the word statistics is used in two contexts. In one context, singular sense, it refers to a 
subject of study that deals with various scientific methods, which are essential from the initial stage of 
data collection to the final stage of data presentation. In other context, plural sense (plural of the 
statistics), it refers to the numerical results obtained by-applying statistical methods to a set of data. All 
the published numerical data on business, finance, population, health, environment etc. constitute 
statistics in the plural sense. 


The subject of statistics is divided into following parts: 
(i) Mathematical statistics (ii) Applied statistics 
(iii) Descriptive statistics (iv) Inferential statistics 


Mathematical statistics deals with the development of statistical theory and methods based on 
certain principles and mathematics, while the applied statistics deals with the applications of statistical 
methods to the data. In this context, business statistics is considered as applied statistics. The statistics is 


also divided into descriptive and inferential statistics. Descriptive statistics is used to summarize or 
present the data, either numerically or graphically. Numerical descriptors of data include the followings: 


(i) One-way or two-way frequency tables. 
(ii) Various kinds of summary measures, such as mean, variance, correlation coefficient and so on. 
(iii) Statistical models. 


While graphical summarizations include various kinds of charts and graphs, such as pie chart, line 
graph, scatter plot and so on. The main objective of this book is to describe descriptive statistics. 


Inferential statistics is used to draw inferences about the population from sample data drawn from 
the population. These inferences may take the form of answers to yes/no questions (hypothesis testing), 
estimates of numerical characteristics (estimation). Inferential statistical methods involve quite advanced 
methods and will not be considered in this book. 


Learning of the statistics will require the knowledge of basic mathematics. Like in other disciplines, 
statistics has its own vocabularies. An important and most frequently used term in statistics is variable. 
Whenever such terminologies are felt necessary to use we shall explain them. 


If we observe or measure a characteristic we find that it takes on different values in different 
persons, places, or things, and it is customary to label the characteristics a variable. Some examples of 
variables include salary, age, sex, post, and educational degree of employees of a company, amount of 
daily sales of a store and marks obtained by the students in an examination. Data are considered as the 
values of a variable (in case of univariate data) or values of several variables (in case of multivariate 


data). The values of variable are generated by measuring, asking or observing the characteristics of 
similar subjects under the study. 


Data generated by quantitative variables (such as salary, age, amount of sales, and so on) are called 


quantitative data, while the data generated by qualitative variables (such as sex, post, educational degree 
and so on) are called qualitative data or categorical data. 


1.4 
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Functions of Statistics 


Statistics is applied in everywhere therefore no nation exists without numerical fact in figure. The 


functions of statistics are as follows: 


(i) 


(ii) 


(iii) 


(iv) 


(v) 


(vi) 


Statistics simplifies the complexity: The function of statistics is to present the huge mass of 
figures into a simple, presentable and understandable form. Using various statistical techniques, the 
complexity can be reduced in simplest form of the information obtained from the study. 


Statistics present the facts in a definite firm: Another important function of statistics is to present 
the information of facts in a quantitative form. By definition, statistics presents any kind of 


information under the investigation in figure and number, so the conclusion stated in numerical 
figure is definite. 


Statistics provides techniques of comparison: Different statistical methods and procedures 
facilitate comparison of the relevant features of several data. The statistical methods such as 


average, measure of dispersion, ratio etc. help in comparison between phenomena which enable to 
draw conclusion. 


Statistics helps in forecasting: In business and industry, forecasting the future based on past 
experience and analyzing the historical tendencies is the most important task. This can be done with 
the help of several statistical techniques such as regression analysis, analysis of time series, index 
number etc. Prediction of future trends obtained from the application of statistical techniques in an 
investigation related to the business and management is important and is more convincing in 
framing plans and policies. 


Statistics helps in formulating policies: The policy formulation of any firm, organization, business 
agency, bank, and nation will be suitable, if it is framed on the basis of statistical analysis and 
information. Making future policy is challengeable task. Statistics provides basic requirements for 
framing the future policies. 


Statistics gives the idea about possibilities of certain events: Probability theory is one of the 
major areas of statistics. We can find the occurrence and non-occurrence of events with the help of 
probability laws and rules. 


(vii) Statistics helps in formulating and testing hypothesis: To draw conclusion and to develop new 


theories in economics and business, formulation and testing the hypothesis is one of the important 
tasks. Formulating and testing the hypothesis without certain statistical techniques are incomplete. 
Testing hypothesis with the absence of statistical procedures and methods mislead the conclusion. 


(viii) Statistics helps to draw valid conclusion: It is difficult task to enumerate each and every members of 


(ix) 


population (universe) under the study of some phenomena. The statistical technique such as sampling 
offers the best and scientific idea to study the sample group and generalize the conclusion so obtained in 
the universe. An investigator or researcher can draw inferences about the whole universe by applying the 
different statistical techniques. Hence statistics helps to draw valid conclusion of the study. 


Statistics provides techniques for organizing data scientifically: Collected data are in the raw 
form. They need to organize scientifically. Statistics provides techniques that help us to organize 
data scientifically. Nowadays it is customary to organize data in computer. Organization or 
management of the data is very essentials in modern research, since well organized data eases the 
work of data analysis and helps for drawing desired information very quickly. 
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1.5 Scope and Limitation of Statistics | 
Statistics is not only viewed as the device for collecting data but as a mean of technique for their 
handling and analysis as well as drawing inferences from them. Likewise, it Is not the by-product of 
administrative set up of the state but it embraces all kinds of Sciences such as social, physical and natural, 
Because of its widespread uses and applications in various diversified fields such as agriculture, industry, 
sociology, biometry, planning, economics, business, management, psychometry, insurance, accountancy 
etc, it is rather impossible to think any sphere of human activity where stalistics does not creep on. Such 
wide use of statistics in the fields of the human activity shows the scope and importance of statistics. The 
scopes of statistics in different disciplines are discussed as follows: 
(i) Statistics in planning 
To achieve the expected goal and objective, planning is the first constraint in any fields of the 
universe. Especially in the field of business and management, for the efficient working and 
formulating policy and decisions, the planning is restored. The statistical information related to 
production, consumption, prices, demand, supply, investment, income, expenditure etc, as well as 
the advanced statistical techniques such as index number, analysis of time series and regression 
analysis all are used in policy formulation and future planning of business organization, industry and 
state also. Now-a -days, efficient planning in every field is compulsory. Thus this modern age is also 
known as "age of planning". : 
(ii) Statistics in economics 
Statistical data and advanced techniques of statistical analysis solve the varieties of economic 
problems such as production, consumption, distribution of income and wealth, wages, prices, profits, 
savings, expenditure, unemployment, investment, poverty etc. Statistical techniques have been used 
in determining the measure of Gross National Product and Import-Export analysis. Furthermore, the 
advanced statistical techniques have been successfully used in the analysis of cost functions, 
production functions and consumption functions. Use of statistics in economics has led to the 
formulation and establishing the economic theory and laws such as Engel's law of consumption, 


Samuelson's Revealed Preference Analysis, Use of Analysis of Time Series, Index Number and S 
Demand Analysis in Economic Planning, Development of New Discipline; Econometrics etc. Thus, 

the interaction between statistics and economics is the effective use of statistics in formulation of ar 
economic theories and economic policies. In fact, statistics got so much integrated with economics an 
that it led to the development of a new subject called econometrics which basically deals with Th 


economic issues involving use of statistics. 


(iii) Statistics in business and management 


It is universally accepted that statistical data and the powerful statistical tools such as probability 
theory, expectation, sampling techniques, and tests of significance, estimation theory, analysis of 
time series, index number, forecasting techniques and so on play indispensable role in decision 
making. The use of statistical data and techniques is indispensable in almost all branches of 
business. It is difficult to have success in business if careful study of the market is not made. St 
Statistics helps in formulating policies, forecasting the future based on the past experience and alg 
records the analysis of time series used in business for the study of trend in order to obtain the Hel 
estimates of the probable demand of goods and seasonal phenomenon for determining ‘Business dat: 
Cycle! which may also termed as the four phase cycle composed of prosperity, recession, depression _allo 
scala nmr another important statistical tool index number (economic barometer) not 
essman to have an idea about the purchasing power of money. info 


Oo 
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Ae techniques have been used by business organization and management in marketing 
€cisions, investment, personal administration, credit policy inventory control, accounting and sales a 


control. The use and important of statistics in business and management is reflected by the visions of 
the following statisticians. 


"Statistics may be regarded as a body of methods for making wise decisions in the face of 
uncertainty — Wallis and Roberts 


"Statistics is a method of decision making in the face of uncertainty on the basis of numerical data 
and calculated risks" _ — Prof. Ya- Lun-Chou 


The uses of statistics in management are follows: 
(a) It helps for making policies, plans and programs. 
(b) Data required for correct managerial decisions come from statistics. 


(c) It helps to identify the factors and events responsible for the overall development of 
organizations. 


(d) It provides a framework for the subject matter of investigation related to management. 

(e) It helps to find out the relationship between variables related to business and management. 

And the uses of statistics in business are as follows: 

(a) In business, it can be used to understand the reasons why the share or commodity markets fall 
and rise. Based on such understandings, one can forecast future market and invest accordingly. 


(b) It can be used to understand the reasons of changing behaviors of consumers from one brand of 
commodity to another or even the future demand of customers. A shrewd businessman can take 
advantage out of such understandings. 


(c) Feasibility study of the market before launching a new product in the market is essential. Statistics 
can help to carry out such study. 


(d) Executives can take data-driven decisions with the help of statistics. 


Statistics in Computer Application 


Statistics is not only viewed as the device for collecting data but as a mean of technique for handling 
and analysis as well as drawing inferences for them. Likewise, computers are ideally suited for data 
analysis concerning large research projects. 


The computer is used for the following data analysis as following steps: 
(a) Data organization and coding. 
(b) Storing the data in the computer. 
(c) Selection of appropriate statistical measures/techniques. 
(d) Selection of appropriate software package. 
(e) Execution of the computer program. 


Statistics helps for creating mathematical model, logical consideration on theoretical bases and 
algorithms for computer science. 


Hence, by computers, resolved data stored as well as problem of data processing for process of converting 
data into meaningful information becomes easy, fast and sage. The sufficient use of internet facilities 
allows electronic mailing device easy for the communication of data or information. Computer alone is 
not sufficient to resolve all the problems that arise during the time of data analysis and interpretation of 
information. So, statistics is inseparable in such cases. 
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;mitations of Statistics | 

saa arene of statistics signifies the importance of tee in each and every field of 
‘ences in the universe. However, it has some limitations. They are as follows: 7 

ae vidually and works on aggregate level only: Statistical findings are 

f average, which may not be true to every individual. It deals with a 

e of facts that indicates characteristics of the whole group. So, 


(i) Statistics does not deal indi 
usually interpreted in terms 0 
group of individuals and aggregat 
individual recognition is not mentioned. 

(ii) Statistics does not directly deal with qualitative phenomena: Statistical method and technique ig 
applicable for only the data expressed in numerical figure. It deals with the quantitative information 
under the investigation. It deals with qualitative characteristics such as intelligence, beauty, aptitude, 
knowledge etc, by changing them into numerical figure with the help of several tools. 


(iii) Statistical laws are not exact: It is not an absolute measured itself. Most of the statistical analyze of 
data based on the statistical measures which are not absolute in nature. For example, correlation 
coefficient, skewness, kurtosis etc. Similarly, when one coin is tossed probability of getting a head is 
half but while tossing a coin six times probability of getting a head may not be half as before. 


(iv) Statistics may be misused: Only a person who has got an expert knowledge of statistics can handle 
statistical data efficiently. If sufficient attention is not paid in collecting, analyzing and interpretation 
the data, statistical result might be misleading. 


(v) Statistics cannot prove anything deductively: The logic employed in inferential statistics is 
inductive in nature [drawing inference from small part (sample) to a larger part (population)], which 
is opposite to deductive logic used in mathematics. The deductive logic or argument does not prove 
anything. 

(vi) Statistical results are sometimes distrusting: Sometimes conflicting nature of statistical statements 
are available in the literature, particularly in the medical sciences. For example, statements like 
"doing X reduces high blood pressure’ and also statements like "doing X actually worsens high 
blood pressure". However, many readers may fail to notice these distinctions, or the media may 
oversimplify this vital contextual information and the public's distrust of statistics is thereby 
increased. 


(vii) Statistical methods or techniques may be faulty in case of heterogeneous data. 


(viii) Some errors are possible in statistical decision. Non- Statistical person do not know whether an error 
has been committed or not. 


Distrust of Statistics 


: The improper use of statistical tools by the normal people having no detail idea of statistics led to 
the public distrust in statistics. If irresponsible, inexperienced and dishonest persons use statistical data 


and techniques, it loses public belief, faith and confidence. Thus, in the science of statistics, several 
distrusts may arise because of the following reasons: 


(i) Figures are innocent and believable. 


(ii) ss put forwarded for arguments may be inaccurate and incomplete and thus distorting the 
(iii) Though the figures are accurate, the di 
, the dishonest person ma i 
and selfish motives. If so, public will be sein ani a a wg 
Hence, utmost care and 1 
as " precautions as fi i i i 
POSE ir Hee RRR ie ar as possible should be taken for the interpretation of 
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1.5.2 Data Collection 


This is the age of i i 
information < te The “rice ion i 
ieee a 8 : atic n and technology. The numerically expressed information is known as 
. SO, Data is one of the main sources of information. The process of getti CESSAry | 

nee at pre ; : process of getting necessary information 

ee is under investigation is called collection of data. Collection of data is the first step in 
statistica investigation. The data collected constitutes the foundation of statistical analysis. Therefore 
care must be taken while collecting data, otherwise the conclusion drawn can never be reliable. In the 
process of collecting data, the person from whom the information are collected known as 'Respondents’ 
and the person who conducts the statistical inquiry is known as ‘Investigator’. 


It is important to note that information obtained from the data will able to answer our subject of 
enquiry; consequently, utmost care must be taken to collect as reliable and relevant data as possible. For 
this purpose, there are several crucial steps that need to be followed during the process of data collection 
in order to ensure that the data collection process and measurement systems are reliable. Incorporating 


these steps into a data collection plan will improve the likelihood that the data and measurements can be 
used to support the resulting analysis. 


The investigator has to carry out preliminary analysis of the problem in question and also has to 
have clear cut decision taken on following points before starting to work of data collection; statement of 
the problem, scope of enquiry, sources of information, methods of data collection, unit of data collection, 
degree of accuracy and nature and type of enquiry. 


Types and Sources of Data 

For any statistical inquiry, the basic problem is to collect facts and information relating to a 
particular phenomenon under the study. Data are the raw materials for statistical analysis to draw a 
conclusion. Data may be either quantitative or qualitative in nature. The person who conducts the inquiry 
or collects the data for study is known as investigator and the person who gives the information to the 
investigator is known as respondents. The process of counting or enumeration together with the 
systematic recording of the information is called the collection of data. 

It is accepted that the data collection is the first step for any type of statistical investigation. Thus, 
accuracy and preciseness of the study are based on the collected data for the study. This means the entire 
structure of the statistical analysis and interpretation is based upon the systemic way of collection of data 
which is reliable and adequate. This is why, before collecting the data for the statistical investigation, 


some points should be examined carefully. They are termed as preliminaries of data collection. The 


preliminaries of data collection are as follows: 

(i) Objective and scope of inquiry: It is essential to define the objective or purpose of inquiry clearly. 
This will enable investigator (researcher) to collect information properly. In the absence of objective of 
the inquiry, irrelevant information omitting important information may be collected and this may 
lead to fallacious conclusion of the study. 


Scope of the inquiry means the coverage with respect to the type of information, subject matter and 

geographical area. It determines the size and type of sampling, selection of universe of the 
investigation and procedures. Thus the decision about the type of inquiry to the conducted research 
or investigation depends upon the objectives and scope of the inquiry. 

(ii) Statistical units to be used: A well- defined and identifiable object or a group of objects with which 
the measurements or counting any statistical investigation are associated is called a statistical unit. For 
example, an individual person, a family, a shop of locality etc. in a social survey. A very important step 


before the collection of data is to define clearly the statistical units on which data are to be collected. 
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(iii) Source of information: For any statistical inquiry, the investigator may collect the information first 


or he/she may use the data from other published sources. The data collected originally by the 
investigator for the first time for the study, the data are known as primary data and if he/she uses the 
data, which had already been collected such as publications and reports of government/ semi- 


governmenUnon-government organizations, magazines, newspapers, research journals etc., the data 


are known as secondary data. 


(iv) Method of data collection: The next thing is to decide method of data collection. If primary data 
are to be collected, a decision has to be made whether census method or sample technique is to be 
used for data collection. In case of primary data, a choice of census method and sample method 
depends upon the objectives and scope of the study, the limitations of resources in terms of time. 
money, manpower etc, and in case of secondary data, testing and editing the reliability, suitability, 
adequacy and accuracy of the data are to be carefully kept in mind of investigator. 

(v) Degree of accuracy aimed in the final results: The information gained from any already 
completed sample study on the subject in the precision achieved for a given sample size may serve 
as a useful guide in this matter provided there is no fundamental reason to this empirical basis. In 
any statistical enquiry, perfect accuracy in final results is practically impossible to achieve because 
of errors in measurement, collection of data, analysis of the data and interpretation of the results. It 
should not be understood to imply that one should sacrifice to conduct the enquiry at low costs. 

(vi) Types of enquiry: Another point has to be kept in mind before collecting the data is to decide the 
type of inquiry. Several types of enquiry are 
i. Official, Semi-official or Unofficial i. Initial or Repetitive 
iii. Confidential or Open iv. Direct or Indirect 
v. Regular or Ad-hoc vi. Census or Sample 
vii. Primary or Secondary 

1.5.3 Types of Data 


For any investigation or enquiry, the collection of data is most important because data are the raw 


materials of the enquiry for its final conclusion. All the information (quantitative or qualitative) collected 
from the respondents and to use for the purpose of statistical analysis, it is termed as data. Based on the 
sources, the data generally are classified in two types. 


They are 
i. Primary data . il. Secondary data. 


1.6 Primary Data and Methods of Primary Data Collection 


1.6.1 Primary Data 


The data which are originally collected by an investigator for the first time for any statistical 


analysis are known as primary data. The data are fresh, first hand and original in nature. The primary data 
are collected for the certain purpose of study or investigation. The source of this type of data is called 
primary source of data. For example, if an investigator wishes to study the average marks of students in 
statistics in a college then the data collected for this purpose by the investigator himself/herself are 
primary data. 
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1.6.2 Methods of Collection of Primary Data 


The following methods are the commonly used for the collection of primary data. They are as follows: 


1. Direct personal interview 2. Indirect personal investigation method 


3. Mailed questionnaire method 4. Schedules sent through enumerators 


5. Information from local correspondents 


Direct personal interview method: It is used to collect the data by the investigator from the 
respondents directly. In this method, the investigator meets the respondents personally and makes 
the necessary inquiries and extracts the required data from them. Thus, it is suitable if the enquiry is 
intensive rather than extensive. Since such investigations require personal attention of the 
investigator, the information, gathered from such investigation is original in nature. Investigator can 
collect additional information as per his/her need. 


Merits of direct personal interview method are as follows: 
i. Information collected using this method is accurate and original 


ii. When the audience is approached personally by the investigator, the response is likely to be 
more encouraging. 


ili. The data collected using this method is reliable. 
iv. This method is flexible. 


v. The investigator can extract proper information from the respondents talking to them at their 
academic level in their language of communication. 


vi. The personal biases in responding can be detected asking cross questions to the respondents. 

Demerits of direct personal interview method are follows: 

i. This method is only suitable in intensive studies. 

ii. It is not useful for the wider area of inquiry. 

iii. It is expensive in terms of time, money and manpower. | 

iv. This method requires intelligent, skillful, trained, tactful and courageous manpower. Otherwise 
the inquiry cannot be reliable valid and satisfactory. 

vy. The respondent may give the biased information which leads the wrong conclusion of the 


investigation. 
Indirect personal investigation method: This method is used when the informants are reluctant to 
give the definite information. Information regarding the property, income, personal habits like 
smoking habits, drug addicts, alcoholism, girl trafficking and disease like HIV/AIDS etc., the 
respondents hesitate to provide true information. In such cases indirect oral investigation is more 
practicable and suitable. In this method, third person is used to collect the information, they are 
called witness. The police report is one of the examples of the indirect oral investigation method. 
Merits of indirect personal investigation are: 
;. This method is less expensive than the direct personal interview method 
ii. |The expert views and suggestions on the problem can be solicited. 
iii, This method is convenient gathering sensitive information with the help of witness. 


iv. A wide area can be covered for investigation. 
v. This method is appropriate in the investigation if the respondents are reluctant to give 


information. 


SO lll 
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rect personal investigation are: 


its of indi ; . 
aauied d touch, the information may be inaccurate and unreliable, 


;. Due to lack of direct supervision an ; , 
ii. If wrong and improper witness is selected, the informauon g ; 2 : 
will be lack of interest and willingness of the witness to give the information. 

i tionnaire method: A set of questions relating to the subject of inquiry is known ag 
uve 2 space for the answers/response to be filled by the respondents is provided, 4 
mailed to the respondents with request for quick response within the specified time 
Respondents must be educated in this method. When the field of 
investigator requires quick result at low cost, this method is more 


iven by them will be biased. 
iii, There 


questionnaire. 
questionnaire Is 
and return to the investigator. 


investigation is large and the 
suitable in practice than other methods. 


Merits of mailed questionnaire method are as follows: 

i. This method is economic in terms of time, money and manpower. 

ii. This method is used for extensive inquiries covering a very wide area. 

iii. It is assumed that educated persons never lie. So, the information obtained is original and 
authentic. 

iv. Errors due to personal biases of the investigators are eliminated. 

Demerits of mailed questionnaire method are as follows: 

i. This method of data collection is not applicable for the uneducated respondents. 


ii. In this method, the informants may feel fear to response questionnaire, so there is high degree 
of non-response error. 


iii. Respondents may reply the wrong answers (information) to questionnaire. 
iv. Some questions contained in the questionnaire may affect the feelings of respondents. 


Schedules sent through enumerators: This method is distinct from the questionnaire method in 
process gathering information. As we discussed questionnaire method above, information or 
response to the questions are filled by the respondents themselves and returns to the investigator. 
And in this method investigator select (appoint) the enumerator or agent and give training to collect 
information from the field of inquiry. Investigator sent enumerators with schedule (a list of 
questions) to the respondents and they ask questions to the respondents and record their replies. This 
method is more practicable when the respondents are illiterate. This method is generally applied in 
population census conducted by the government. 


Merits of schedules through enumerator method are as follows: 

1. Non- response error can be minimized using this method. 

i‘ Information through this method is more accurate than mailed questionnaire method. 
i. This method can be used even when the respondents are illiterate. 

i SRRNELAIOS can check the accuracy of the information by asking cross questions. 
v. It is suitable in wide area of investigation. 

educa of schedule sent through enumerator method are as follows: 

1. This method is expensive regarding time, cost and man power. 


ii. Enumerators must be well -trained. O 


correct that results fallacious conclusion of the investigati 
on. 


iii, The enumerator may not be responsible 


: well in collectin i fo i 
Vv I In 
IV. he respondents may not believe the en . Seen 


umerators, 
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5, Information through correspondent or local agency method: In this method, the information is 
not formally collected by the investigator or the enumerators. The investigator appoints the 
correspondent or local agent/agency in different places for the inquiry to collect the information. 
These correspondents or local agencies/agents collect the information in accordance with 
appropriate/suitable ways and then submit their reports to the centra! or head office where the data 
are processed for the final analysis. This method of data collection is usually used by the media 
agency. 


Merits of information through correspondent method are as follows: 


i. 
il. 


ili. 


This method of data collection can cover wide area of inquiry. 
It is cheapest method. 


This method is more appropriate to get regular information. 


Demerits of information through correspondent methed are follows: 


i. 


il. 


Due to the personal prejudice and biasness of correspondent, the information may not be 
accurate and reliable. 


It is inconvenient and time consuming to use in the absence of facility of communication such 
as telephone, internet etc. 


1.6.3 Problems of Primary Data Collection 


While collecting primary data, various problems have to be faced. The problems or. difficulties 
which arise during of primary data collection and secondary data collection are discussed below: 


Problems in collecting primary data are as follows: 


Vi. 


Vil. 


Dishonest and irresponsible respondents may show non- response character. 

Real information may be collected if the respondents do not understand questions clearly sent 
by the investigator. 

A problem may arise due to lack of transportation facility. 

There is high degree of non- response error from the uneducated respondents. 

Personal biasness and prejudice of enumerators cause the fallacious conclusion of 
investigation. 

It is required that the expert, knowledgeable, skillful, trained and intelligent manpower for preparing 
questionnaire and collecting primary data which is not possible and accessible for all cases. 


If the scope of inquiry is wide, money, time and manpower should be sufficiently available. 
Otherwise, inadequate and inaccurate data may get. 


1.7 Secondary Data and Methods of Secondary Data Collection 


1.7.1 Secondary Data 


The data which have already been collected and processed by some agency or person and taken over 
from there and used by any other agency or person for their statistical analysis are known as secondary 
data, Such data may not be original in nature. Thus secondary data are less accurate than primary data. In 
a case of some inquiry, collection of primary data is not always practicable due to availability of time, 
money and manpower. There is a lot of published and unpublished information from which further 
studies can be made. 
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1.7.2 Methods of Secondary Data Collection 


The sources of secondary data mainly classified in two types. They are: 


j. Published sources 
There are number of nation 
relating to business, trade, educat 1 
money exchange etc. and publish their findings in re 
secondary data. 
Published sources of secondary data are as follows: 
Reports and publications of ministries, departments of the government, semi- government 
offices, NRB, FNCCI etc. 
Reports and publications of worldwide reputed INGO'S such as UNDP, UNO, UNESCO, 
WTO, WHO, SAARC, World Bank etc. 
iii. Reports and publications of reputed NGO's, research journals, periodicals, dissertations etc. 


iv. Reports of various committees and commissions appointed by the Government. 


al and international organizations or agencies who collect statistical data 
ion, health, population, poverty, consumption, import, export, 
ports. These publications serve as the source of 


ii. Unpublished sources: 
All the statistical data need not be always published. 
There are various sources of unpublished statistical data. They are 
i. Records maintained by government offices. 
ii. | Researches carried out by the individual research scholars faculty members in the universities. 
ili. Records updated by the departments, institutions for their internal purpose. 
iv. Records maintained by private firms or business enterprises which they do not like to publish. 


1.8 Merits and Demerits of Secondary Data 
1.8.1 Merits of Secondary Data are as Follows 


a. It saves time and money. 
b. The scope of inquiry can be increased in terms of area and time period. 


c. If investigator is expert and skillful to filter and gather the required information, the quality of 
secondary data is better. 
d. Investigation using secondary data can have more references to consult. 
Demerits of secondary data are as follows: 
a. Data may not be in the exact form of the requirement of the investigator. 
b. The data may be outdated. 
c. When secondary data are gathered from two different sources, it may not be comparable in terms 
of definition, units and time period covered. 


d. The degree of accuracy and reliabili 


ty of the investigation sho i igation or 
enka. 8g uld rely on previous investigation 


e. Exact definition of units and terms used in secondary data may be unknown 


Precautions in Using Secondary Data 


Before using the secondary data, the i i 
. » the investigato 
given problem under investigation. They are as bio SR SD Teer Blears 


ae 


how 
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i. Reliability of Data: Actually, reliable data used in study ensure the reliability of the conclusion to be 
drawn. The result of investigation or study to the given problem will mislead if the data are not reliable. It 
is essential that the agency or organization should be unbiased in collecting data in sense that it had no 
personal motives and interest. So, the investigator should be careful about the following points: 

i. The reliability, integrity of the organization or institutions. 
ii. The reliability of sources of information. iii. The methods used for the collection of the data and 
iv. The techniques and procedures used to analyze the data. 

ii. Suitability of Data: Even if the data are reliable, but may not be suitable for the purpose of inquiry 
under study. Thus it is confirmed first before to use that whether the data is suitable for the study or 
not. For this, it is necessary to observe the homogeneity in terms of objective, nature, scope, 
condition, terms and units used in the original inquiry and investigation in hand. 

iii, Adequacy of Data: The data collected for a study which is suitable to use in another investigation 
and reliable but even one thing is to be considered that whether the data is adequate for the study or 
not. Suitable and reliable data of a study may not be adequate and sufficient for other study or 
investigation. To draw a valid conclusion and to obtain the reliable resuit of study using secondary 
data, the data must be checked by the investigator whether the data is reliable or not first, suitable or 
not in second and at last and right before to use, it should be examined the adequacy of the data. 
Otherwise, objective of the study may not be gained. 


Problems in Collecting Secondary Data 
The problems of collecting secondary data are as follows: 
a. Suitable, reliable and adequate data to the inquiry under the investigation can be rarely obtained. 
Sometimes, the unpublished data cannot be obtained as the person hesitates to give it. 
b. Irrelevant or duplicate data may be collected. 
The data collected could be flawed or misinterpreted. 
d. The secondary data which are suitable and reliable for a study but that may not be even adequate 
for the study. 


° 


Differences between Primary Data and Secondary Data 

As we have discussed above, the data which is primary for one investigator is treated as secondary 
for the other. Mainly they are different in terms of method and mode of collection of data. Some of the 
inquiries require primary data only, likewise secondary data are sufficient and adequate in several 
inquiries and both type of data may be essential for some investigation. However primary data and 
secondary data are different with each other. They are as follows: 


oo | Sertntary data 


i. Primary data are accurate and original. i. Secondary data may not be accurate and original in 
ii. Methods of primary data collection are more the sense that they are collected by other. 


expensive in terms of time, cost and manpower. ii. Methods of secondary data are less expensive. 
iii. Primary data may be influenced by personalliii. Secondary data may not be influenced by 
biasness and prejudice of the investigator. personal biasness and prejudice of investigator. 


iv. It is mostly used in statistical investigation 
and analysis. 

v. It might have been collected with the 

different objectives. 


iv. It is mostly used in establishment of new theory. 


v. It is collected as per the objective and scope of 
investigation to be carried out by investigator. 
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Exercise 1.1 


Theoretical Questions 


1. 


CRIN aR wD 


| nl 
> 


~— 
pt 


Define statistics and its types. Discuss its functions and limitations. . 

Define statistics and explain its uses and applications in computer application. 

Why the people distrust statistics? Justify with reasons. . 

Define primary data and explain the problems of collecting primary data. 

Define secondary data. What are the problems of collecting secondary data? Explain. 
Differentiate between primary data and secondary data. 

Discuss the methods of primary data collection. 

Explain questionnaire method with its merits and demerits. 

Differentiate direct personal interview and indirect personal investigation method. 
What are the sources of secondary data? What precautions should be kept in mind before tv use the 
secondary data? 


. Explain application off statistics in the field of computer application? 


Exercise 1.2 


Multiple Choice Questions: Circle (O) the correct answer. 


1. How many types of data are there on the basis of sources of data collection? 

(a) 1 (b) 2 (c) 3 (d) 4 
2. The statement, "Statistics is both a science and an art", was given by: 

(a) R.A. Fisher (b) Tippet (c) L.R. Connor (d) A.L. Bowley 
3. Who stated that statistics is a branch of applied mathematics which specializes in data? 

(a) Horace Secrist (b) R.A. Fisher (c) Ya-lun-chou (d) L.R. Connor 
4. The word "statistics" is used as: 

(a) Singular (b) Plural (c) Singular and plural both 

(d) None of the above 
5. "Statistics provides tools and techniques for research workers", was stated by: 

(a) John I. Griffin (b) W.I. King (c) A.M. Mood (d) A.L. Boddington 
6. Data taken from the publication, ‘Agricultural Situation in Nepal’ will be considered as: 

(a) primary data (b) secondary data (c) primary and secondary data 

(d) neither primary nor secondary data 
7. Which of the following represents data? 

. a single oe (b) only two values in set 

a group of values in a set d) none of th 

8. Statistics deals with: a pre 

(a) qualitative information Sf a s 

(Mae 2 aan si aha 
9. Statistical results are, ky end'(b) 

(a) cent per cent correct 

b 
(c) always incorrect : oh el oe 
misleading 


Answer Key 


ran a be 
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9) Descriptive Statistics 


2.1 Introduction 


One of the important objectives of statistical analysis is to describe the characteristics of a frequency 
distribution by determining various numerical measures. To analyze and interpret the main characteristics 
of a frequency distribution, it is required to determine the numerical measures central value dispersion, 
skewness, kurtosis, correlation etc. Averages are the representative value of the frequency distribution 
which give us the gist nature and characteristics of the huge mass of unwieldy numerical data. — 


After the data have been classified and tabulated, the next step is to analyze it. However, tabular, 
diagrammatic and graphical approaches are the visual illustration of the unorganized data. These 
techniques are not capable of describing the quantitative data in detail. Therefore, one of the most 
important objectives of statistical analysis is to determine various numerical measures which describe the 
inherent characteristics of a frequency distribution. The first of such measures is “average”. The averages 
are the measures which condense a huge unwieldy set of numerical data into single numerical values 
which are representative of the entire distribution. 


2.2 Measures of Central Tendency 


Averages have typical nature that all other items (values) of the distribution concentrate around the 
center. Averages are the values in the central part of the frequency distribution which give us an idea 
about concentration of the values. So, they are also referred as the "Measures of central tendency". 


Definition: The single value that can represent whole statistical data is known by central value and its 
nature is known as measure of central tendency. It lies on the central part of data. For example, Ram is a 
average student it means that he gets central mark in whole class. 


Objectives of Central Tendency 
i) To get single value that represents the entire data 
ii) To facilitate comparison 
iii) To present the salient feature of a mass of complex data 


iv) To help computing various other statistical measures such as dispersion, skewness, kurtosis, 
correlation, regression and various other basic characteristics of a mass of data 


v) To trace mathematical relation 
vi) To help in decision making. 


Requisites of a Good Average 


1) It should be rigidly defined: The average should be defined such that it has only one 
interpretation by different persons. 


li) It should be easy to calculate and simple to understand. 
ili) It should be based on all the observations. 
iv) It should be suitable for further mathematical treatment. 


v) It should be least affected by the extreme observations. That i is, 
should not unduly affect the value of a good average. 


vi) It should be least affected by fluctuations of sampling. 


smallest or largest observations 


ee 


ook of. Probability and Statistics for BCA 
f Central Tendency 


ly used in practice are as follows: 
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9.3 Various Measures 0 


The measures of central tendency common 


1. Mean 
a. _ Arithmetic Mean (A.M.) . 
(i) Simple and; (it) Weighted 
b. Geometric Mean (G.M.) c. Harmonic Mean (H.M.) 
2. Median 3. Mode 
2.4 Arithmetic Mean 


The arithmetic mean is the most popular and widely used measure of central tendency. It is also 


called simply ‘the mean’ or ‘the average’. It is also considered as an ideal measure of central tendency or 


the best-known/golden measures of central tendency because it satisfies almost all requisites of ideal 
measure of central tendency given by Prof. Yule. Arithmetic Mean (A.M.) is the most commonly used 
measure among all the averages. This is due to the simplicity of its calculation and other advantages. It is 
used to calculate the average value of quantitative data when the distribution does not have very large and 
very small items. It is also used to obtain average value of distribution having closed ended class intervals 
and having non —extreme items. 
Definition: Arithmetic mean of a given set of observations is their sum divided by the number of 
observations. It is denoted by ¥ (read as "X bar"), (for sample statistics). 

Population mean is denoted by 1, (for population parameter). 

Arithmetic mean (A.M.) is called an ideal measure (or best measure golden measure) of central 
tendency due to the following reasons: 

i) In the study of computer science, engineering, social sciences, economical or commercial 

problems such as production, income, prices etc., A.M. is used to calculate average. 

ii) It satisfies almost all properties (or requisites) of an ideal measure of central tendency. 

ili) It is rigidly defined. 

iv) It is based on all the observations. 

v) Itis simple to understand and easy to calculate. 

vi) Itis suitable for further mathematical treatments. 

vil) It is least affected by fluctuation of sampling. 

Vili) It is quite familiar to layman and 

ix) It has wide applications in statistical theory at large. 


Uses of Arithmetic Mean 


Arithmeti i ; 
ithmetic Mean (A.M.) is more suitable average than others while we are dealing with quantitative 


measures such as average bonus i 
» average income, average 5 ge p i 
ales, averag. f 
average height, average expenditure, average revenue etc aa eae 


2.4.1 Calculation of Arithmetic Mean 


a) Individual Series 


Individual series is ungrouped data wher 


after observations. In this ungrouped dita, eae very value of individual item is listed singly 


, arithmetic mean is calculated as follows: 


Descriptive Statistics 17 


i) Direct Method 


Let X), X>. Xa. +. XN, be the n variate values of a random variable X, Then arithmetic mean is 
computed by the following formula: 


where, LX = the sum of observations 
n =the number of observations. 
ii) Short-cut Method (or Assumed Mean Method or Change of Origin Method or Deviation Method) 


If the number of observation is very large and the values of observations of the given data are 
also large (i.e. given figure is large in digits), calculation of mean (4...) by direct method 1s 
tedious and time consuming. In this case, we take the deviations of the items from any arbitrary 
number for computing A.M. This method is known as assumed mean method or short-cut 
method or deviation method. The formula for calculating the A.M. (mean) by this method 1s 
defined by 


rae, 


where, A = Assumed mean or arbitrary value 
d = X-—A = Deviations of the items from the assumed mean. (Origin changed) '4’ 
= Number of observations. 
Note: There is no any hard and fast rule for the selection of ‘A’ but it is better to take a value 
between highest and lowest values. 
iii) Step Deviation Method or Change of Origin and Scale Method or Coding Method 


For large value of observations, sometimes values are changed into smaller values by the 
change of origin and scale. For this, observations are multiplied or divided by a constant and 
this method is called step deviation method. The formula for calculating A.M.by step deviation 


method is given by 
= xa" 
XN=Art ae h 


where, d' = a4 h = Common factor (Scale change dividing by a factor) 


| 
j 


n = Number of observations | 


Note: There is no any hard and fast rule for the selection of A and h but better to take the value of 
A between highest and lowest values and to take the value of h is common factor of the values. _ 


1| The following are the daily incomes of five persons in a certain locality. 


Income (in Rs.) 


Calculate the average income. 
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Solution: Computation of the average income 


400 
350 


y =—=-—=400 (For use of calculator see unit 8) 


Hence, the average income is Rs.400. 


The following table gives the monthly income of 10 employees in an office: 
[Employee 1 oe a lo | 8 | 8 9 | 10 
Income (Rs.) | 4780 | 5760 | 6690 | 7750 | 4840 | 4920 | 6100 7810 | 7050 | 6950 
Calculate average income. 

Solution: Here, taking 7000 as the assumed mean i.e. A = 7000 


Income (Rs.) (X) d =X — 7000 

1 4780 — 2220 
2 5760 — 1240 
3 6690 —310 
4 7750 750 
5 4840 — 2160 
6 4920 — 2080 
7 6100 —~ 900 
8 7810 . 810 
9 7050 


6950 


Xd = ~7350 
= Ld (- 7350) 
Mean (X) = 4 += = 7000 +*—;9 — = 7000 — 735 = 6265 


Calculate the average wage by using step deviation method from the following data 
Wage (Rs.‘00’): 50, 55, 60, 65, 70, and 75 
Solution: Taking Assumed mean (A) = 60 and common factor (A) =5 


Wage (Rs. ‘00’) X d'=4(X —60) 
50 =3 
55 
=1 
60 0 
65 1 
70 2 
és 7S 3 
Total 


a 
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Here,n= 6,4 =60andh=5 (common factor) 
; — Xd’ 3 15 
Arithmetic mean (X) = 4 + me h=60 +k oes = 62.5 
Hence, average income = Rs. 62.5 « 100 = Rs. 6,250. 
b) Discrete Series 


If the data is presented along with their corresponding frequencies, then it is called discrete series (or 
discrete frequency distribution). 


i) Direct method: 
Let X1, X2, X3, ---, X, be the variate values of a random variable X and f), fo, fs, ---, f, be their 
respective frequencies. Then in discrete series, 4.M. CX) is given by 

Xi + f2X)+ ++ +f,Xn 
fit fotos f, 


x= wy Where N= Xf =Total frequency 


ii) Short-Cut Method (or Assumed Mean or Coding or Change of Origin Method or 
deviation method) 


The formula for calculating arithmetic mean (X) using this method is given by 
=a Lfd 
XA ty 

where, 4 = Assumed mean, N= 2f = Total frequency 


d =X-—A = Deviation of the items from the assumed mean 'A' 


iii) Step-Deviation Method (or Coding Method or Change of Origin and Scale Method) 


Generally, coding refers to the transformation of data by adding (or subtracting) or multiplying 
(or dividing) a constant. The addition or subtraction of a constant is called change of origin 
where as the multiplication or division by a constant is known as change of scale. The method 
of changing origin as well as scale is also known as step deviation method. Thus, the formula 
for calculating the arithmetic mean by this method is given by 


ek Xfd’ 
X=At 2d xh 
X-A 
where, A = Assumed mean, d'= ye 
N =2f =Total frequency, h = Common factor 


Note: In case of individual and discrete series (ungrouped frequency distribution), the step deviation 
method | can be used to caleulate A.M, only when ‘h’ can be taken, as common factor from all ae 
items of the given distribution. 


[Example 2.4 | Calculate arithmetic mean for the following frequency distribution. 


x ee ee 20 25 
f 2 4 7 3 1 
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Calculation of mean using direct method 


Solution: - 
x i ag 
, 7 : 10 
6 4 40 
i. 7 105 
- 3 60 
96 1 25 


Laincdedesn Fs Dix - ” -1412 (For use of calculator see unit 8) 


Determine the average wage of 75 employees in a company by changing the origin of data 


(i.e. short-cut method) from the data given below: 


Wages (Rs.'00’) | 10 | 20 30 | 40 | 50 | 60 | 70 
No. of employees 20 15 12 10 4 8 6 
Solution: Calculation of average wage using short-cut method, A = 40 
Wages (Rs. '00') d=X-40 a fd 

10 — 30 20 — 600 

20 — 20 15 — 300 

30 —10 12 — 120 

40 0 10 0 

50 | 10 4 40 

60 20 8 160 

70 30 6 180 
Xfd=— 640 


=p Be BHO 
Average wage (XY) =A+ N = 40 + M9) 408.53 231.47 


Hence, average wage = 31.47 x Rs.100=Rs. 3,147 


Following are the marks secured by 40 students. Calculate 


and scale (Step-deviation method). 
Marks (X) 
No. of students 


average marks changing origin 


Solution: We have, 4 = Assumed mean = 50 


oe 


h = Common factor = 10 
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Calculation of average marks using step-deviation method _ 


Marks (x) Frequency (f) d'= to fd 

nn | 4 4 

20 4 ae -|2 

30 6 «2 -]2 

40 10 nf 10 | 

50 | 7 0 0 

60 5 5 

70 3 2 6 
| 80 2 3 6 
90 | 4 4 
| N=40 | Lfd'=-21 


aa, ' a 2 
Now, Arithmetic mean (¥) = 4 += x 4 = 50 +24 « 10 = 50-40 = 44.75 
Hence, average marks = 44.75 


Continuous Series (Grouped Data) 


When the observations are classified using some short-range values along with class frequency, then 
it is said to be grouped data or continuous series or continuous frequency distribution. In this case, the 
midpoint of the class interval is considered as average value of the lower limit and upper limit of each 
class interval. 


The formula for calculating arithmetic mean (A.M.) in continuous series is same as discrete series 
but in continuous series_X is the mid value of class intervals. 


a) Direct Method 
Let X denotes the mid value of the class intervals and f is their corresponding frequencies. Then 
A.M. (X) = zit 
where, _X is the mid-point of the class interval, N = Total frequency 
b) Short-cut method or assumed mean method or change of origin method or deviation method 
The formula for calculating the arithmetic mean (4..) by this method is given by 


2g Shel 
X=At N 


Where, A = Assumed mean, X= Mid-value of the class intervals, N = Xf = Total frequency 
d = X-—A= Deviation of the items from the assumed mean ‘4’. 


c) Step-deviation Method or Change of Origin & Scale Method or Coding Method 
The formula for calculating A.M. by this method is given by 


= d' 
Zoastitnn 


hi ’ xX—A ei 2 5 c 
where, d' = ~~, A= Assumed mean, X = Mid points of class intervals, N = =f =Total frequency 


h = Class size or class width or common factor 
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22 
Example 2.7| (Direct Method): Compute the A.M. for the following data by using direct method. 
Weights (kg) O- 10. 10-20, 20-30 30-40 40-50. 
No. of students 4 6 a ee ee 
Solution: Computation of A.M. a 
Weights (kg) No. of students (f) Mid value (x) {x =! 
0-10 4 | 4 = 
10 — 20 6 i ») 
20 - 30 8 25 200 
30 - 40 5 35 he 
40 — 50 3 [ ; AS 7 - SS “eesti id 
_hiihiees wea efx = 620 | 
N=LZf =26 27) 20) 4 
x 620 7 r See unit & 
Mean (x) = mgt 6 = 23.85 (For use of calculator see unit 8) 
Example 2.8 (Short cut Method): Compute the arithmetic mean for the following data . 
Weights (kg) [ 0-10 | 10-20 | 20-30 | 30- 40 | 40-50 | 
‘No. of students | 4 | 6 | 8 | 5 3 | 
Solution: Computation of A.M. : 
| Weights (kg) | No. of students | Mid value (x) | d=x-A =x-25 fd_ 
0-10 | 4 = - 20 - 80 
10 - 20 6 15 -10 — 60 
20 — 30 8 25 0 0 
/ 30-40 5 35 10 50 
|__ 40-50 3 | 485 20 | 60 
| Lf = 26 | Sfd=-30 
= xfd -30 
Mean (x) = 4 +=! = 25+ =)- 25 ~ 1.15 = 23.85 
(Step Deviation Method): Calculate 4.M. 
Wages (25-30 | 30-35 | ) | 35-40 40-45 | 45-50 | 50-55 | 55-60 | 60-65 , 65-70 


No. of Workers | 10 | 13 | 18 | 21 | 24 ae aa, 8 
Solution: Calculation of average wages using changing origin and scale of data 
Let Assumed Mean (A)=47.5,h=5 


| j | 1_ X- 47.5 
Wages (Rs.) | Mid. value (X) | d =i | f | fd 
| 25-30 | 75 r a | i _ 
| 30-35 32 | | 
| <3 | 13 | 39 
35-40 | | -3 
37.5 -) | a : 
40 ~ 45 | 42.5 | = 30 
ae 21 | 2 
45 ~50 47,5 | 0 | : | =a 
50-55 52.5 | | 24 | 0 
55 - 60 57.5 | ‘ 28 28 
60 - 65 62.5 | 20 40 
3 
65 ~ 70 | 67.5 | i | 11 33 
NSF 153 : = 
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We have, 


= Xfd' —3 
Mean wage (X) = 4 + =i¢ xh=47.5 +2) x § = 47.5 — 0.098 = 47.40 


Hence, mean wages is Rs.47.40 


Calculation of A.M. in Case of Open End Classes 


The frequency distribution in which the lower limit of first class or upper limit of last class or both 
are unknown(i.e. not specified), such classes are known as open end classes. In case of open end classes, 
we cannot find out A.M. unless we make an assumption about the unknown limits. To estimate the lower 
limit of the first class and upper limit of the last class, the assumption would depend upon the class 
interval following the first class and preceding the last class. 


Calculation of A.M. in Case of Cumulative Frequency Distribution 
(Less than and more than cumulative frequency distribution) 


The following data shows the life time in hours of 400 tube lights. Find the mean life time. 


Life time | Less than | Less than | Less than | Less than | Less than] Less than | Less than| Less than | Less than| Less than 
(in hrs) 900 1000 1100 1200 


a 20 116 194 265 324 374 392 400 
lights 


Solution: Since, the given frequency distribution is in the type of less than cumulative frequency 
distribution. So, it should be converted into ordinary frequency distribution. 


Here, A =750, h = 100 
¥ =A + AE xh = 750 + Gop” * 100 = 750 ~ 36.25 = 713.75 hes. 


xample 2.11 | The following table represents the marks of 100 students. 


Marks (More than) | 20 


30 


Life time (in hrs.) | No. of tubes (f) | Mid. value (X) q' =~ fa’ 
300 — 400 20-0=20 | 350 -4 [ —80 
400 — 500 60 — 20 = 40 450 -3 —120 
500 — 600 116 — 60 = 56 550 —2 -112 
600 — 700 194-116=78 650 -1 78 
700 — 800 265 —194=71 750 0 0 
pe 324 — 265 = 59 850 l 59 

900 — 1000 Phas 950 ] 100 
1000 1100 ae: 1050 3 54 
_ 1100-1200 408 392 =8 1150 A 
N=2Xf = 400 Lfd'=— 145 


No. of students 


Find the mean marks of all 100 students. 


Solution: Since the given frequency distribution is in the type of more than cumulative frequency 
distribution, it should be converted into ordinary frequency distribution. 
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~ X= 55 < 
No. of students (f) Mid value (X) d'=—9 fa’ 
100-95 =5 25 -3 — 
95-80 =15 35 e ~30 
80-60 =20 45 wd | -20 
60-55 =5 55 0 0 
55-50 =5 65 5 
50-30 =20 75 2 40 
85 3 4s 
30-15 =15 
15 95 4 60 | 
i — 
N=2f = 100 2fd'=85 | 
Here, A=55, h=10 
We have, Fa 44S x pa 55 +35 x 10=55+85= 63.5 
Note: 
1. For A.M., it is not necessary to convert unequal class intervals into equal class intervals. But in 


2. 


case of unequal class intervals, / is taken as common factor from the mid-value of classes. 


It is also not necessary to convert inclusive class interval into exclusive class intervals because 
mid points remain same whether inclusive or exclusive. 


Properties of Arithmetic Mean 


i) 
ii) 
ili) 


iv) 


v) 


The algebraic sum of the deviations of the given data from their arithmetic Mean (X) is zero. 
That is, LX —¥) =0 and =f(X— x) =0 
The sum of the squares of the deviations of the given data taken from their A.M. is always 
minimum. 
Sum of the given data is equal to the product of number of observations and their arithmetic mean. 
That is, L¥=nX and LfX= NX 
If two different series having no. of observations , and 7 have their respective Arithmetic Means 
X, and_Y>, then the combined Arithmetic Mean_X. of the combined series is given by 

= _ mXi+mX2 

X12 = mM+Mm — 
AM. is dependent of change of origin and scale. That is, A.M. is affected due to change of 
origin and or scale implies that if the original variable _Y is changed into another variable Y by 
change of origin say a and scale say b . Therefore, if Y=a+bh X, then 4.M. of Y is given by 


Y=a+by. 
Merits and Demerits of Arithmetic Mean 
Merits 
. = is tay sa b. It is easy to calculate and understand. 
d eo a : pier d. It includes extremities of the series 
5 metic Mean is sta it j . 
: © average, so it is least affected by fluctuations of sampling. 


It is suitable for further mathematical treatme bo 
al tr i i 
eos an tments. For e.g. Combined Mean as discussed above 
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Demerits 

a. It is much affected by extremities of the data. 

b. It cannot be used in the case of open end classes such as less than 20, more than 100 etc. 

c. It cannot be determined by the method of inspection. 

d. It cannot be located graphically. 

e. It is illegible if even a single observation (item) is lost. 

f. It cannot be used if we are dealing with qualitative characteristics which cannot be measured 


quantitatively. 


It may give meaningless result and lead to wrong conclusion if very small and very big item is 
included (or the data is more skewed). 


2.5 Weighted Arithmetic Mean 


While calculating simple arithmetic mean, it is based on the assumption that all the items in the 
distribution are equally important and valued. But in practice, this may not be so. The relative importance 
of some items in a distribution is more important than others. So, in such cases, proper weight 
(priority/merit) is to be given to various items. That is, the weights given to each item being proportional 
to the importance and value of the item in the distribution. When the weights are assigned for individual 
items with their relative importance or priorities (or worth), then the arithmetic mean calculated with 
respect to their priorities is called weighted arithmetic mean.. 


gq 


Let, W,, W., W3, ---, W, be the weights assigned to the variate values X1, X>, X3, ---, X, according to 
their importance respectively. Then weighted arithmetic mean, usually denoted by _¥,, and is given by 


— WX, + WX, + WXy+ + WX, LW 
sil Wi +Wr+Wy+--+W, ~ IW 


where, W = given weight 


Example 2.12 | An enquiry into the budgets of middle class families in a family gave the following 
information. 


g ; Food Rent Clothing | Fuel - Others 


15% 20% | 15% 25% 
Index number 85 65 137 


Compute weighted A.M. 


— Wx 
Solution: We have, Xw = TW 


Calculation of weighted average 


Group Weights (W) | Index No. (X) | WX 
Food 10 90 900 
Rent 15 100 1500 

Clothing 20 85 1700 

Fuel 15 975 

Others 25 3425 
2WX = 8500 


Weighted mean cee yw 85 7 100 
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Example 2.13)” Example 2.13] The postal s service ve handles six basis types of asi and cards. The mail is given in table. 


| Type of mailing | a | | ; ee per ounce aie ) 

Air mail 15000 0.12 

| First class | 77000 | 0.15 

Second class 1g1000 0.11 | 

Third class 16000 | 0.05 

Registered | 1000 | 0.40 

Certified | 500 | 0.45 

Find out the average revenue per ounce for these services. 

Solution: Calculation of average revenue . 

| Type of mailing | ager = Price per ounce (X) | WX 

Air mail | 1500 0.12 | i830 
| First class 77000 0.15 | 1550 
| Second class 181000 0.11 | 19910 

Third class 16000 0.05 | 800 
| Registered 1000 0.40 | 400 
= Certified 500 0.45 | 225 

= =LWX 33065 


Xw = Sw = 7114100 = 


Thus, the average revenue per ounce is Rs.0.2898. 


Combined Mean or Mean of Combined Series 
Let, X, and X2 be the arithmetic means of two series of sizes n, and ny respectively. Then the 
combined mean of two series of size (7 + nz) is denoted by Xj, and is given by 
X= m Xi +m X) d 
12 pac ia rs and so on. 
For, three sets of data combined mean (X23) is given by 
— ny X\ + Ny X2 +73 X3 
X123 = Se ee 
Ny +Nng+ ny 


Sr ple 2. Essmee 214] From the information given below, what is the average daily wage for the workers of two 
actories? 


| Factory A Factory B 
No. of wage earners 20 


Rs. 250 


Solution: Factory A: m=35, x,=Rs. 200, Ex, =? 


os oe 
So, = — x 
) ca > y=mx, =35 x 200= Rs.7,000 
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Factory B: 


m= 20, x,=Rs. 250, Ex, =? 
=x, 
So, mS B= mF, = 20 x 250 = Rs.5,000 


: = mx, + mx 
Combined mean = y,, =——-—* = 


35 x 200 +20 x 250 12000 
mtm — 35 + 20 =~s55 =Rs. 218.18 


xample 2.15 | In a class of 50 students 15 have failed and their average of marks is 10. The total marks 


secured by the entire class were 881. Find the average marks of those who have passed. 
Solution: Given, 


Passed Failed 


Class 
n=50-15=35,y,=? = = 881 
1 = 39,x,=7 Ny= 15, x = 10 


ny + m= 50, Dxy2= 881, ¥1y= "5g = 17.62 


== Nx; + Mx, 17 5 
is". = 
12 ny +1 > 6 


The average marks of those who have passed = 21. 


aeu 35x, +15 x 10 


= => ¥,= 20.88 


Corrected Mean 
The formula for the calculation of corrected mean is given by 


~ Corrected 7 
Corrected mean ( A eos) = "Treated ve 
where, Correct ZX = Incorrect XY — Incorrect items + Correct items 
Incorrect LY =nX =n x Incorrect mean 
If any item or items is/are omitted then 
Correct XX = Incorrect XY — Incorrect item, 


Correct n = Total items — Number of omitted item/items. 
Sometimes any item or items is/are missed, in those cases, 


Correct XX = Incorrect XX + Missed item/items. 
Correct n = Total n + Number of missed item/items. 
Mean of 100 items was 50. Later on, it was found that two items were misread as 60 and 
8 instead of 192 and 66. Find the correct mean. 
Solution: Given: No. of items (n) = 100, Mean (X) = 50 
Misread items = 60 and 8 
Correct items = 192 and 66 


= xX 
We have, Xx =~ > 50= 709 => ZX =5000 


Incorrect XX = 5000 
Correctn = 100-1-—1+1+1=100 
Correct LX = 5,000 —- 60-8 + 192 + 66 = 5190 


Correct Xx 5190 


Correct (1) ="Conectn = 100 =>!- 
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Se ee NS 


Arithmetic mean of 150 items is 50. Two items 40 and 60 were left out at the time of 
calculations. What is the correct mean of all the items? 


Solution: Given, mean (x) = 50, No. of items, (1) = 150 
Left out items = 40 and 60 

Required mean of all the items (Corrected x) =? 
We have, za => 50 = => xXx =50 x 150 = 7500 
=x without including left items = 7500 
i.e. Wrong Xx = 7500 

Correct n = 150+1+1= 152 

Correct Xx = 7500 + 40 + 60 = 7600 


Correct 2x 7600 
So, Comect® ="Correctn = 152 = %° 


Example 2.18 | The mean marks of 100 students were found to be 65. At the time of checking it was 
found that the three marks 40, 50 and 55 were incorrect. Find the correct mean if the incorrect marks 
are omitted (or weeded out). 


Solution: Given, mean (x) = 65, No. of students, (”) = 100 
Incorrect marks = 40, 50 and 55 


Correct mean after omitting incorrect marks (¥) =? 


— xXx =x 
We have, L=5 > 65 = 3909 => XY = 65 x 100 = 6500 


2x without omitting incorrect marks = 6500 
i.e., incorrect xx = 6500 
Correct n = 100-1-—1-1=97 
Correct Zx = 6500 — 40 — 50 — 55 = 6355 
_ Correct Xx 6355 


Correct (LY) =“Conectn = 97 705-515. 


2.6 Geometric Mean (G.M.) 


The geometric mean, usually abbreviated as G.M. Therefore, if a series consists of n observations 
then G.M. is the nth root of their product of n observations. Sometimes when we are dealing with 
quantities that change over a period of time, we need to know an average rate of change such as an 
average growth rate over a period of several years. In such cases, the simple arithmetic mean is not 


appropriate because it gives the wrong answer over estimate. What we need to find out correct central 
value is the geometric mean, simply called the G.M. 


(a) In Individual series, 
Let, X;, X2, ---, X, be n observations, then G.M. (G) is given by 
G =[X, X, aya 
After simplification, 


Zlo 
GM. = a DORK 
M. = Antilog ( n ) where n = Number of observations, X = Variate values 


_ ow 
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xample 2.19 | Find out the G.M. of the given data. 


Days | > i 3s } ~~ 7 5. | 
Gow mae | 1.07 | im | tie) tp Tt oe. | 
Solution: ee Calculation of geometric mean 

| Days [ Growth) 2 pee”: aoe 
I [ — «407~C~“<s~CSSSS”S””S «OS 293BHSSCS~™S | 
2 | 1.08 0.033424 | 
3 | 1.10 | 0.041393 | 
4 | 1.12 | 0.049318 | 
5 | 1.18 | 0.071882 


2 log x = 0.2254 
eek fo: (1 
Geometric mean (G.M.) = Antilog (21282) = Antilog 15 (0.2254) ¢ = 1.1093 


. Average growth factor = 1.1093 i.e. 11.09% per day 
(b) In discrete series, 


If X), X2, X3, ---, X, are the numerical values of the variables and Fis fa. £3, > fn are their respective 
frequencies, then , 


Geometric Mean of the data is given by 


G.M. = [Xif 1, Xofo, Xafs, ~-, Xafal 
After simplification, 


z 
G =Anitlog (*2te2) where N=Xf 


Uses of Geometric Mean 
(a) To find the average rate of population growth and the rate of interest 
(b) To find the average rate of profit (c) Itis used for the construction of Index Number. 


(d) It is the most appropriate average to be used in the cases where it is desired to give more 
weightage to smaller items and less weightage to larger items. 


(e) It is especially useful in averaging ratios, percentages and rates of increase between two 
periods. 


2.7 Harmonic Mean (H.M.) 


While dealing with the average speed, average velocity or simply time is as an in dependent variable 
H.M. is to be calculated. H.M. is defined as the reciprocal of A.M. of the reciprocals of the set of non- 
zero variate values. 


(a) In Individual series: Let X), X2, ---, X,, be 'n' observations, then H.M. (H) is given by 
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o complete an order of 1400 toys of g 


f four workers t 
[Example 2.20] 3.20] A toy factory has assigned a group © ) 
Seal type. The productive rates for the four workers are respectively 4, 6, 10 and 15 minutes per 


toy. Calculate the average minutes per toy by the group of workers by using harmonic mean. 


Solution: , 
Production rate (minutes per toy) (Xx) XY 
4 : 0.25 
6 0.1667 
10 0.1 
15 0.0667 
+= 0.5834 


n 4 ; 
H. M. si: 05834 = 6.856 minutes per toy. 


zy 


(b) In Discrete series: Let X,, X2, X3, ---, Xn are N observations with their corresponding frequencies f, 
fx, f3s “+s fn Then Harmonic Mean (H) of the V observations altogether is given by 


; 1 
= ~*~ where, N = Xf > 


aa i Te)” apt 
NG litg ht agate th) AX 


Weighted Harmonic Mean 
If the speed of travelling and the distances traveled by various speeds vary, then weighted H.M. is 


ae W. : ‘ . 
calculated. It is given by (Hw) = af. where W is the weight assigned to the variable X. 
EG 
[Example 2.21 | A man travels first 900 km. of his journey by train at an average speed of 50 km. per 
hour, next 2000 km by plane at an average speed of 300 km. per hour and 20 km by bus at an 
average of 30 km per hour, what is the average speed for the entire journey? 
Solution: Since, different distances travelled in different speeds. So, weighted harmonic mean is used. 
Calculation of average speed 


Speed in km/hr (X) | Distance traveled (w) + W 4 
ry 
50 900 0.02 | 18 
o 2000 0.00333 6.67 
k 20 |___ 0.0333 0.67 
EW = 2920 _ 
LW = 25,34 


Weighted H.M. (Hy) - = on = 115.23 
Wes 34 , 


Therefore, average speed for the entire journey is 83.59 kms per hou 
; r. 
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Merits and Demerits of Harmonic Mean 
Merits: 
a. It is based upon all the observations. b. It is rigidly defined. 
c. It is suitable for further mathematical calculation. 
d. Let H, be the H.M. of n, observations and H, be the H.M. of ny observations then combined 
H.M. of 1 + nz observations given by; 
ek E al 
Hn +n A * Hy 
e. It is not affected by fluctuations of sampling. 


f. It gives greater importance to small items and is useful only when small items have to be given 
a greater weight. 


g. It can be used in constructing Human Development Indices (HDI). 
h. _ It is used to find rate where time is present as an independent variable. 
Demerits: 


a. _ It is not easy to calculate and simple to understand as compared to 4. M. 
b. If any of the observations is zero, H.M. becomes zero. 


Use of H.M: It is the most appropriate method of finding the averages speed of vehicles, airplane etc. 


Relation between 4.M., G.M. and H.M. 


A.M., G.M. and H.M. of a series of n observations are connected by the relationship: 
(i) (GMY=AMxHMand (ii) AM>GM>HM. 


2.8 Median (M,) 


Median is a positional average which divides the whole distribution of data into lower 50% and 
upper 50%. It is also called, middle value of given distribution. It is quite different from mean as the 
median describes the position of the variable in the distribution. Thus median is the value which divides 
the distribution of values (arranged in ascending order or descending order) in two equal parts. 


The median divides the total number of observations into two equal parts such that 50% of the items lie 
above median and 50% of the items lie below the median value. Its value depends on the position occupied 
by a value in the frequency distribution. So, it is also called “positional average”. It is denoted by M,. 

Median is another descriptive statistical measure used for the central values. It is suitable measure of 
central tendency (or average) for the qualitative characteristics such as knowledge, intelligent, beauty, 
honesty, talent, good, bad, defective, etc. It is also more appropriate/suitable (computable) average (or 
measure of central tendency) for the open ended data distribution. 


Considerations of Computing Median 

There are certain conditions in which the median can be calculated. They are as follows: 

a. The items (observations) should be arranged in ascending order or descending order according 
as the magnitude of values. 

b. The frequency distribution of the data should be continuous with ‘exclusive type class 
intervals’. For example 0 — 10, 10 — 20, 20 — 30, ..., ete. 

¢. For the computation of positional average median, the classes in the continuous series may be 
unequal and open ended. 


d. Median is the only average to be used while dealing with qualitative characteristics, still arranged in 
order of magnitude. For example, computation of average of beauty, honesty, intelligence etc. 
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Calculation of Median 

(i) In Individual Series 
The items (observations) should be arranged in ascending order or descending order according as the 
magnitude of values. 
If the number of observation is odd number, then median is the middle value after arranging the data 
either in ascending or descending order. Again, if the number of observations is even, there wil] be 
two middle values. So, the A.M. of two middle values gives the median. 


The formula for calculating the median in case of individual series is given by 
th 
Median (Md.) = Value of ee) item, where (7) = Number of observations 


(ii) In Discrete Series: The steps involved are: 
a. Arrange the given data in ascending order of their magnitudes. 
b. Obtain the less than cumulative frequency (c.f) 


th 
c. Position of Median is given by 4) item where, N = Xf = Total frequency 


th 
d. See the value of ‘ow, in less than cumulative frequency column and note the value 


: ; ' ; ‘N + 1 \th 
corresponding to the cumulative frequency either equal to or just greater than that of SS 


e. The corresponding value of the variable gives the median. 


(iii) In Continuous Series: The following steps are to be used in finding the median in case of 
continuous series: 
a. _ Prepare the less than cumulative frequency (c.f.) distribution. 


th : 
b. Using the formula, the position of median is given by 5) item where, N = Xf = Total 
frequency 


c. See cumulative frequency equal to or just greater than the value of 3 and note the 


corresponding class interval. 
d. The corresponding class interval contains the median value and is called the median class. 


Then, Median is computed by applying the following formula, 
N 
3 e: f. 

f 


M,=L+ xh 


N oe 
a= Position of the median class. 


where, N= Total frequency, 
L = Lower limit of median class f = Frequency of median class 
h = Class size of median class or width of the median class. 
c.f.= Less than c.f. preceding the c.f. of the median class. 
Note: 1. The classes should be exclusive type to calculate the median from continuous series of the data, 
2. Median can also be calculated for the distribution having unequal class interval, i.e. for 


— of Median, it is not necessary to be equal class size unless it is stated amend 


_ 
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[Example 2.22 | Find the median from the set of observations: 30, 40, 25, 18, 27, 26, 35 


Solution: Arranging the given data say in ascending order of magnitude: 
18, 25, 26, 27, 30, 35, 40 


. rth. , th . 
Median (M,) = Value of at) item = Value of (44) item = Value of 4" item 


Since 4" item is 27 so Median (M,) = 27 


Find median from the following data: 48, 30, 40, 35,27, 29,38,45 


Solution: At first, arranging the given data in the ascending order: 27, 29, 30, 35, 38, 40, 45, 48 
: 1\th . th : 
Median = Value of () item = Value of Cw item = Value of (4.5)" item 


Since (4.5)" item is the average of 4" and 5 items, median is the average of 35 and 38. Thus, 


35 +38 
2: 


Median = = 36.5 


In another way, 
Median = 4" item + 0.5 (5" item — 4" item) = 35 + 0.5(38 — 35) = 36.5 


Example 2.24 | The following data gives the daily wages of 80 workers in a firm. Calculate median wage. 


Daily wages (in ‘00’Rs.) | 2 7 | 8 | 10 15 
Number of workers 20 15 12 15 18 
Solution: Calculation of median 
Daily wages (‘00’ Rs.) (X) No. of workers (f) c.f. 
2 20 20 
a 15 35. 
8 12 47 
10 15 62 
15 18 80 
N=80 


N + 1 \th 80+ 1\t 


0 h. 
Median = Value of (4) item = Value of ( 7 ) item = Value of 40.5" item 


From c.f. table, the c.f. just greater than 40.5 is 47 and its corresponding value of '\X' is 8. So, 
Median wage (My) = 8 x Rs.100 = Rs.800 


xample 2.25 | Find the median of the given data: 


Weight (in gm) No. of oranges 
40-45 14 
45 - 50 20 
50-55 42 
55 — 60 54 
60 -— 65 45 
65 — 70 18 


70 —75 


Solution: 


Solution: 


i. 
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Computation of median 


Weight in gms No. of oranges (f) Cumulative frequency (c.f,) 
40 — 45 14 14 
45 — 50 20 34 
50 - 55 42 7 
55 - 60 54 130 
60 — 65 45 175 
65 —70 18 193 
10-75 Z 200 


; th ; 
Median = (5) item = 2) item = 100" item 


From table, c.f. just greater than 100 is 130 which corresponds the class (55 — 60). So Median 
lies in the class (55 — 60). i.e. median class = (55 — 60) 


Now we have, L=55, f =54, cf =76,h=5 


N 
5 — ef 
f 


Median (M,) =L + xh =55 +25 (100 - 76) = 57.222 


Example 2.26 | The following data gives the weekly wages in rupees of 24 workers of a firm. 


[Wages per week (in Rs.‘000’) | 10-14 | 15-19 | 20-24 | 25-29 | 30-34 
Number of Workers S| <2 


Compute Median wage. 
Since the given class intervals are inclusive type we should first convert the given inclusive 
class into exclusive class intervals before calculating median by using correction factor. 


Correction factor (C)) = Half of difference of lower limit of the succeeding class and upper 


15-14 
limit of preceding class (C,) = 5. = 0.5 


Reconstruction of the given data and calculation of median, 


Wages per week (in Rs.*000’) Workers (f) 3 cf 

9.5-14.5 4 4 

14.5 — 19.5 4 7 

19.5 — 24.5 8 19 

24.5 — 29.5 3 

29.5 — 34.5 2 4 
ee ee ee 


24 
> = 12. From c.f. table, the c.f, just greater than 12 is 19, 


whose corresponding class is (19.5 — 24.5), so median lies in the class (19.5 — 24.5) 


Position of median is given by 2 = 


N 
2 cf. 


Ma =L+ Kh= 1954222 


x 5 = 20.125 
Median wage = 20.125 x Rs.1000 = Rs. 20,125 
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Merits and Demerits of Median 


Merits: 

a. It is the most appropriate average for open end classes. 

b. It is clearly defined. c. [tis not affected oy extreme observations. 

d. It is easy to calculate and simple to understand. 

e. It can be used for averaging the qualitative characteristics viz. knowledge, honesty, 


intelligence, beauty etc. 

f. It can be obtained graphically. 

g. Sometimes it can be obtained by inspection only. 

Demerits: 

a. It is necessary to make array of data i.e. arrangement of data in ascending or descending order. 

b. It is not based on all the observations. 

c. It is not suitable for further mathematical treatment. 

d. Median cannot be determined exactly directly using place value formula in case of even 
number of observations for ungrouped data. 

e. It is affected more by fluctuations of sampling as compared with mean. 


2.9 The Partition Values 


The values which divide the total number of observations into a number of equal parts are called 
partition values. Thus, median may also be regarded as a particular partition value because it divides the 
given data into two equal parts. 

The partition values are generally used for scaling and ranking of test scores in psychological and 
educational statistics. It is also useful in personnel work and productivity ratings. Partition values are 
calculated by arranging the data in ascending or descending order of magnitude. Once the quantitative 
data is ranked according to their magnitude, it is possible to define various boundaries of the set. 

Depending upon the equal number of parts, the important amongst these partition values which 
frequently used are 

a. Quartiles b. Median c. Deciles d. Percentiles 


2.9.1 Quartiles: 

Quartiles are those variate values, which divide the total observations into four equal parts and each 
part equal to 25%. Hence, in such cases, it is obvious that there are three quartiles (points) Q,, Q, and Q; 
such that Q; < Q) < Q; to divide the data into four equal parts. Q, is called the lower quartile (or first 
quartile) and Q3 is the upper quartile (or third quartile). 25% values are below Q, and 25% values are 
above Q and rest 50% values lie between Q; and Q3. Since, Q> divides the series into two equal parts and 
hence it is same as median. Generally, quartiles are used in economics and business and also in 


determining the shape of a distribution. 
25% 25% 25% 25% 


a 


=e saaitasl 


Calculation of the Quartiles 


(i) In Individual Series: After arranging the given data in ascending order of magnitudes, Quartiles 
can be obtained by the following formula: 


i(n+1)" 
i” quartile (Q;) = Value of inet item, where i= 1, 2, 3 
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[Example 2.27 Find lower and upper quartiles from the give 


Solution: Here, number of observations 1.¢. 7 = 
First the given data are arranged in asce 


Lower quartile (Q)) = Value of 


n data. 18, 20, 15, 16, 25, 19, 12, 22, 14 


9, 
nding order 12, 14, 15, 16, 18, 19, 20, 22, 25 


(nt 1)" 9 +1)" 
a * item = Value of CE 


= Value of 2" item + 0.5 (3°-2™) item = 14 + 0.5 (15 — 14) = 14.5 


th 9 Be ] th 
wate item = Value of 3 Gri 


= Value of 7.5" item = Value of 7" item + 0.5(8" - 7") item 


= 20405 (22-20)=21 


item = Value of 2.5" item 


Upper quartile (Q;) = Value of 3 item 


(ii) In Discrete Series: 


] th 
Quartile (Q,) = value of wa item, i= 1, 2,3; N = Xf =Total frequency 


Steps involved are: 


a. 
b. 


c. 


Arrange the given data in ascending order of their magnitudes. 
Obtain the less than cumulative frequency (c.f.) 


i(N + 1)" 


Position of i" quartiles = rm 


See the value of Saar i 


item, where, N = )'f = Total frequency and i= 1, 2, 3. 
th 
in less than c.f column and note the value corresponding to the c.f 


5 th 
either equal to or just greater than that of eu 


‘ . th 
The value corresponding to the c.f equal to or just greater than that of the value of ie x is the 
quartile. 


xample 2.28 | Find first and third quartiles from the given data. 


Solution: 


>. 


xX | 1 
2 


a 
5 


ae: 
g 10 


Calculation of quartiles 


os 


5 6 7 


IN AWUAWHN =| 


; : " N+1 th 
First quartile (Q,) = Value of SEP item = Value of & +1 


th. 
The cf just greater than 8 4 ) item = Value of 8.5" item. 
Sf er than 8.5 is 14 and the corres : 
5 pondin . 
First quartile (Q,) =3 g value of X is 3 


(iii) In Continuous series or Grouped frequency distribution 


The 
a. 


b. 


Note: 1. The classes should be exclusive type to calculate the quartiles for the continuous series of 


From the given following data, calculate the first and third quartile. 


Solution: 
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th 
‘i . h + 
Third quartile (Q3) = Value of 3 (ea item = Value of 3 CAA) jtem 


= Value of 25.5" item. 
The c.f. just greater than 25.5 is 28 and the corresponding value of X is 5. 
Third quartile (O3) = 5 


following steps are to be used in finding the quartile (Q,) in case of continuous series: 
Prepare the less than cumulative frequency (c.f.) distribution. 


Using the formula, the position of i* quartile is given byi (7) item, for i= 1, 2 and 3 


See c.f: equal to or just greater than the value of i ea and note the corresponding class. 
The corresponding class interval contains the i” quartile (Q;) and is called the i” quartile class. 
Then, i” quartile (Q,) is computed by applying the following formula, 


iN 
4 of 


Q;=L+ xh, where i= 1, 2,3 


the data. 


a a quartile (Q;) can also be calculated for the distribution having unequal class interval. 
i.e. For calculation of i" quartile (Q)), it is not necessary to be equal class size unless it is 
stated amend the data. 


Wage in Rs. 0-10 10-20 | 20-30 30-40 40 — 50 


No. of workers 85 160 75 35 


Calculation of Quartiles 
Wage in Rs. No. of workers (f) cf 
0-10 45 45 
10-20 85 130 
20 — 30 160 290 
30 — 40 75 365 
40-50 35 400 


ne SS ee 


: N 400 ; 
For Q;: The position of Q; is given by 7="4" = 100, the c.f. just greater than 100 is 190. 
So, Q; lies in the class 10 — 20; L = 10, c.f'=45, f = 85 andh =10 


47h 100 — 


45 
Hence, Q; =L+ f xh=10+ ag 10] 16.47 


For Q;: an = arama = 300, the c.f. just greater than 300 is 365. 


So, Qs lies in the class (30 — 40); L = 30, c.f = 290, f = 75 andh=10 
3N 
anch 
Q3; =L+ f xh=30+ 


300 ~ 290 
35 * 10 = 31.33 
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2.9.2 Deciles oer 

The variate values which divide the total number of observations into ten equal parts are Called 
deciles and each part equal to 10%. Hence, there are in all nine deciles denoted by Di, Dr, ik Do such that 
D, < D)< D3< + < Do. The number of items that lie between any two consecutive deciles is 10% and the 
items before D, and Dy are also 10%. Ds the fifth deciles divides the series into two halves and hence it ig 


the same as median. ; 
10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 


“+ -~ -~ 
‘ites -~ - ~L- ~ 
: 
& 


a - es 
iar i na Wicd a Panes 


Calculation of Deciles 

(i) In Individual series 
After arranging the given data in ascending order of magnitudes, 
Deciles can be obtained by the following formula 


. ] th 
D, = value of GU item, where i = 1, 2,3,---,9 
Example 2.30 | Find the 4" and 7" deciles from the following data: 
23, 34, 25, 40, 36 and 40 
Solution: At first, arranging the data in ascending order 23, 25, 34, 36, 40, 40 


di , dos ya 4(n+1) 46 
Position of 4'" decile (D,) is given by tet) a 4640 = 2.8 


The value of 4" decile (D4) = 2" item + 0.8 (3 item — 2™ item) 
= 25 + 0.8 (34 — 25) = 25.72 


Position of 7" decile (D7) is given by Tet) = Her) =49 
The value of 7" decile (D7) = 4" item + 0.8 (5" item — 4" item) 
= 36 + 0.8 (40 — 36) = 39.2 
(ii) In Discrete series: 
th 


i , 
D,= Value of 10 item, where i= 1, 2,3,.-.,9; N= 2f =Total frequency 
Steps involved are: 
a. Arrange the given data in ascending order of their magnitudes 
b. Obtain the less than cumulative frequency (c.f). 


c. Position of i” deciles - -W+)" 
eciles 19 Item, where, N= Xf = Total frequency and i = 1, 2, 3, ---, 9. 


‘ th 
d. See the value of ie in less than i 
10 cumulative frequency column and note the value 


corresponding to the c.f: either equal to or just greater than that of OES 
10° 


The value corresponding to the c.f i(N+D 
-f, equal to or j : hae 
reo Just greater than that of the value of ~ v* Is 


Descriptive Statistics 39 


Compute 5" decile and 9" decile from the following data: 


: Income/week (in Rs.000) ey ia ; i a i: 
a eal ee ta ee fA NA ee 
: umber of employee 16 
s Solution: Computation of 5" decile and Ce decile 
Income/week (in Rs.000) ik No. of employee (/) | Cu mulati ve frequency (c.f.) 
1.2 7 - 7 
1.5 11 18 
1.8 16 34 
2.1 10 44 
2.4 6 50 
2 2 52 
Total 


th th 
Position of 5" decile (Ds) is given by ety beet = 26,5" 


From c.f. table, the c.f. just greater than ie 5 is 34 which corresponds the value 1.8 in income 
column. Thus 5" decile (Ds) = 1.8 


5™ decile (Ds) = 1.8 x Rs.1,000 = Rs.1,800 
th th 
Position of 9" decile (Ds) is given by 44M _2C2 41 _ 47 7 
From c.f. table, the c.f. just greater than 47.7 is 50 which corresponds to the value 2.4 in 
income column. Thus 9" decile (Ds) = 2.4 
9" decile (Do) = 2.4 x Rs.1,000 = Rs.2,400 
(iii) In Continuous series or Grouped frequency distribution 


aa, 
F 


10 
D;=L+ x h, where i= 1, 2, 3, 9 


Steps involved are: 
a. Prepare the less than cumulative frequency (c.f.) distribution. 


Tes (Nth. 
b. Using the formula, the position of i decile is given by i (*) item, for i= 1, 2,3, --.,9 


Nth 
c. Seec.f equal to or just greater than the value of (*) and note the corresponding class. 


d. The corresponding class interval contains the i" decile (D;) and is called the i* decile class. 
Then, i” decile (D,) is computed by applying the following formula: 


De=b+ xh where, i= 1, 2, 3, --.,9 


C The following frequency distribution is the marks distribution of 20 students in an 


examination out of 100 full marks. 


[Marks obtained 


20-40 | 40-50 50-60 | 60-80 


Calculate the 2™ decile and 8" decile. 
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Solution: Calculation of 2" decile and 1 8" decile 
Marks obtained ~ | No.of students (f) cf 
bs ——_}——— - 
0-20 | “4 
20 - 40 5 | 7 
40 — 50 7 | 14 
50 - 60 3 17 
60 - 80 2 19 
80 - 100 | 20 
Total N=20 
N\th 20\th 
Position of 2"‘ decile (D) = 2 (* =D (3) =4 
10 10 
From c.f, table, the c.f. just greater than 4 is 7 which corresponds to the class interval (20 — 40). 
Thus 2™ decile lies in the class (20 — 40). 
2N 2 x 20 
; 10 °F ig. 
Value of 2™ decile (D2) =i+ F xh=20+——s—_ * 20 = 28 
h 20\th 
Position of 8 decile (D) is given by (7a) =8 (3) 16. 
From c.f. table, the c.f. just greater than 16 is 17 which corresponds to the class interval (50 - 
60). Thus 8" decile lies in the class (50 — 60). 
8N 8 x 20 
ein (as 10 ~!4 
Value of 8" decile (Dg) = Z + F: xh=20 ae 10 = 56.67 
2.9.3 Percentiles 


median. 


consecutive percentiles is 1% i.e. it represents 


The variate values which divide the total number of observations into 100 equal parts are called 


percentiles. There are in all 99 percentiles and are denoted by P, Py, «.., Pog respectively such that P, < P) 
<P3<... 


< Po. The area before p; and after poo are 1% respectively. Also, the area between any two 
1 
100 Part of the population. 50" percentile is same a 


I% 1% 1% 


ae 1% 1% 1% 
PE ER -sAGcaaied a 
i Woe Wge - ataattnassnebecettayende caster couse Py; Pog Poo 


Calculation of Percentiles 


(i) In Individual Series 


After arranging the given data in ascending order of magnitudes 
Percentiles can be obtained by the following formula 


> . 


' th 
P;= value of Hint item, where i = 
100 ; res = 1, 2,3, os, 99; n= No. of observations 
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[Example 2.33] Compute the 45" percentile from the following data 
£2, 9,43, 10), 9, 11, 14,15, 17, 20; 12 
Solution: At first, arranging the data in ascending order as below: 9, 9, 10, 11, 12, 12, 13, 14, 15, 17, 20 
th a th 
Position of the 45" percentile = peti a — A 6U Bud a a =5.4 


The value of 45" percentile (P45) = 5" item + 0.46" era 5 item) = 12 +04 (12 — 12) = 12 
(ii) In Discrete series: 
i(N+1)". 
ie Walneot 109. tem, where i=1,2,3,-.,99; N=) /f =Total frequency 


Steps involved are: 


a. Arrange the given data in ascending order of their magnitudes. 


b. Obtain the less than cumulative frequency (c.f) 
th 


a : ; i(N +1 
c. Position of * percentiles = ei item, where, V = Lf and i= 1, 2, 3, ---, 99. 


i(N+1)" . 
d. See the value of * 100 in less than cumulative frequency column and note the value 


i(N +1)" 


corresponding to the cumulative frequency either equal to or just greater than that of “| 00 


; : i(N+1)". 
e. The value corresponding to the c.f. equal to or just greater than that of the value of Wee is 


the percentile. 


Compute 15" percentile and 60" percentile from the following data: 
wage/day (inRs.00') | 12 | 15 | 18 | 21 | 24 [ 27 
Number of workers 7 11 16 10 6 2 


Solution: Computation of 15" percentile and 60" percentile 
Wage (Rs. '00') 
12 
15 
18 
21 
24 
27 


No. of workers (f) | Cumulative frequency (c.f.) 


1I5(N+1)"_ 15 (52+1)" 
Position of 15™ percentile (Ps) = poe = Beart = 7.95, 


From c.f. table, the c.f. just greater than 7.95 is 18 which corresponds to the value 15 in wage 
column. Thus 15" percentile (P15) = 15 


15" percentile (P)s) = 15 x Rs.100 = Rs.1500 


; 60.N+1)" 60(524 1)" 
Again, position of 60" percentile (P¢o) = or = eer” = 31.80" 


From c.f table, the c.f just greater than 31.80 is 34 which corresponds to the value 18 in wage 
column. Thus 60" percentile (P¢0) = 18 


60" percentile (Peo) = 18 x Rs.100 = Rs.1,800 
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(iii) In Continuous series or Grouped frequency distribution 
iN 
100 ~ °F douse 
5 i h, where i= 1, 2, 3, >; 


Steps involved are: 
a. Arrange the given data in ascending order of their magnitudes. 


b. Obtain the less than cumulative frequency (c.f-) 


‘ th " 
c. Position of i" percentiles = a 99 item where, N=Zf andi=1, opp Hg 2D, 


* th . i 
d. See the value of “ in less than c.f column and note the class interval corresponding to the c f 


iy” 


either equal to or just greater than that of 100 
The class interval corresponding to the c.f. equal to or just greater than that of the value of 
U : ; 

100 #8 the percentile class. 


The actual value of i" percentile is given by 


—,;,*h, where i= 1, 2, 3, ---, 99 


Example 2.35 | From the following distribution of marks of 500 students of a campus, compute 20th 
percentile and 75th percentile. 


Marks 0-20 | 20-40 | 40-50 | 50-60 | 60-80 80-100 
No. of students 50 100 150 90 60 50 
Solution: . 
[ Marks . a No. of students _| cf, 
0-20 50 | 50 
20 - 40 100 150 
40 — 50 150 300 
50 — 60 90 


Position of 20" percenti sous 20N _ 20 x 500 
percentile (P39) is given by 100 =~ 409 = 100. 


The c.f just greater than 100 is 150. 


So, P29 lies in class 20 — 40.Then, L = 20, f= 100, ef =50, 4 = 20 


20 N F 
100 ~ © 
Py» =L+ nt 100 — 50 
- fie, (SRO iieby 
The position of 75" percentile = ISN _ 3x 500 
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a a 
From the c.f. table, the c.f. just greater than 375 is 390. So, P75 lies in class 50 — 60. 


375 — 300 
90 


So, Pis=L+ xh =50+ x 10 = 58.33 
Note: a. Median = Q) = Ds = Psy 
b.. All partition values (quartiles, deciles, percentiles) inchiding median can be located with 
_ the help of cumulative frequency curve. 


For example, 


Less than ogive 


0; M, 
Variable 


Q3 


2.10 Mode 


Mode is also an important measure of central tendency. Mode is the value (observation) in the series 
which repeats (occurs) maximum number of times or Mode is the value (observation) which has the 
highest frequency. Mode is the most frequently occurring value, whose repetition is maximum i.e. mode 
is the value, whose frequency is maximum. 

There are many situations in which A.M. and median (Md.) fail to reveal the characteristics of data 
such as most common stock, most common wage, most common height, most common size of shoe, size 
of T-shirts and other ready-made garments we have to have mode in mind and not the A.M. or Median. 

The word "Mode" is derived from the French word "La Mode" which is termed as fashionable value 
of the series. For example, Let us consider the following statements. 

i. Average size of the sandal sold in a shopping mall is 40. In the above statement the average 

does not refer to mean or median. The average referred to the mode which is the most frequent 
value (observation) in the distribution. From the statement, we mean that there is maximum 
demand for the sandal of size numbered with 40. Furthermore, 


li. Average height of British is 5' 10". 
iii. Average people in a dashain festival spends Rs.3,000. 


Merits and Demerits of Mode 
Merits: 
i. It is easy to calculate and understand. ii. It is not affected by extreme values of the series. 
ili, It can be even obtained in the distribution/data having open ended classes. 
Demerits: 
1. It is not rigidly defined. ii. It is not based on all observations 
ili. It is not suitable for further mathematical treatment, 
iv. It is affected by fluctuations of sampling. 
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Computation of Mode . . | 
a) In Individual Series: In case of individual series, the mode is the variate value/ observation that 
occurs maximum number of times. In other words, the most repeated value/observation/item in the 


data is modal value. For example, Mode of the data 10, 13, 9, 11, 10, 20, 13, 10, 19, 1g is 1 


because it repeats maximum number of times (3 times). 
In Discrete Series: In case of discrete series, the mode is the value which corresponds to the hj gheg 


b) 
frequency. For example, . 7 
[Marks obtained (x) 10 | 20 | 30 | 40 | 50 | 60 [ 70 | 80 Be 
No. of students ( 3 6 9 20 25 18 7 3 1 
In the above table, the highest frequency is 25 which correspond to the marks (value) 50, Therefore 
mode of the frequency distribution of marks of students is 50. 

c) In Continuous Series: In case of continuous series, the class interval in which mode lies on, 

y inspection. In other words, the Class 


corresponding to the highest frequency can be obtained b 
interval having highest frequency in frequency column is the modal class. Then the mode can be 


calculated by using the formula as below: 
Mode (M,) ste h for A; +A, #0 
where, ZL = Lowest value of modal class 
A; = Difference between highest frequency and its preceding frequency (f,— fo) 
Ay = Difference between highest frequency and its succeeding frequency (f , — f>) 
Size of modal class intervaV/height of modal class interval 


Example 2.36 | Calculate the mode for the following data. 


[Class interval 
[Frequency 


Solution: Since 20 is the highest frequency, the class interval (60 -80) is the modal class. 
Here, L = 60, f, = 20, fo= 13, fo=14,h=20. Then 
Ai = fi-fo=20-13=7, A. = fi- f2=20-14=6 


A 20—13 
Mode (M) = L+——~ = 
(Mo) tA +A," °° + 20=13) + 2014) x 20 = 70.77 


Graphically, the measurement of mode of i i 
cally, as continuous series of data can al 
presenting the data in histogram as shown as below: ogre 


y 
20 
Pal 
S) 16 
g | 
S 8 
mh <a 


20 40 60% 80 10 
Class Interval 


". Mode (M,) = 70 (Approximately) 


g 
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While applying the above formula to calculate the mode, the following assumptions must be 
considered: 


i. The frequency distribution must be continuous with exclusive type class intervals without 
any gaps. 

ii. The size (width) of all class intervals must be the same. 

The above formula is not practicable in following conditions 

i. The maximum frequency is repeated (not unique), 


ii. The maximum frequency occurs either in the very beginning or at the end of the 
distribution. : 

iii. The difference between maximum frequency and the frequencies preceding and 
succeeding it is very small. 

iv. There are irregularities in the distribution i.e. the frequencies of the variate values 
(observation) increase or decrease ina haphazard way irregular way/fluctuated. 


Grouping Method of Computing Mode 


If the above formula is not practicable and the frequency distribution is not regular then the grouping 
method is suitable to compute the mode. 

Method of grouping can be carried out in the following steps. A column table consists of six 
columns they are obtained as— 

ColumnI : The original frequencies 

Column II_ : The combination of frequencies two by two in column | 

Column III : The combination of frequencies two by two after leaving first frequency in column I. 

Column IV : The combination of the frequencies three by three in column I. 

Column V : The combination of the frequencies three by three after leaving first frequency in 


column I. 


Column VI : The combination of the frequencies three by three after leaving first two frequencies 


in column I. 


The maximum frequency in each column must be marked. After completing the grouping table, an 
analysis table must be prepared. From the analysis table, the value in the column having the maximum 


frequency is the mode. 


[Example 2.37 | Determine the mode of the distribution from the following data: 


| 55 | 60 | 70 | 
6 1 4 


f 6 | 8 | 10] 12} 3 | 2 


Solution: Since the frequency distribution irregular type, we obtain mode by grouping method as below: 


Grouping Table 
: i | _m 
20 


I 

5 
25 6 14 19 
30 8 
35 10 
o | @ @ | @ 
45 3 5 17 
50 2 
55 6 7 9 
60 ] 11 
70 4 
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Analysis Table 


From the above table, 35 occur maximum times i.e. 5 times. 


Mode = 35. 


Relationship between Mean, Median and Mode 

a) For a symmetrical frequency distribution, Arithmetic Mean (X), Median (M,) and Mode (M,) 
are identical, i.e., X = M,=M, 

b) For moderately skewed frequency distribution, Mean (X) Median (M,) and Mode (M,) are not 
identical i.e., X 4 M,# M,. They satisfy the following relationship: 
i. | Mean — Mode = 3(Mean — Median) ii. Mode = 3Median — 2Mean 
uli. Which was established by Karl Pearson and the relationship is known as an empirical 


relationship between Mean, Mode and Median. 
Empirical Relationship between Mean, Mode and Median 


If a frequency distribution is bimodal or multimodal (if the frequency distribution of any variate 
value have three or more than three modes) the mode is ill-defined. Such situation creates inconvenience 
in further statistical calculation and statistical analysis. In this case mode is computed by using empirical 
relationship given by Karl Pearson is 


Mode = 3 median —2 mean, i.e., Mode = 3Mq—2X 
Note: Types of data based on mode are as follows: 


1. Nomodal data 2. Unimodal data 3. Bimodal data 4. Multimodal data 


i ked Out Examples _ 
Example 2.38] The following data are the monthly salaries (in Rs.'00') of 50 employees in a factory. 
30 45 48 55 39 25 31 12. 18 Sy. 54 59 51 


33 43 44 10 
38 19 26 41 35 37 4] 46 33 51 37 58 58 17 19 23 26 
29 38 57 36 35 44 43 27 19 43 22 3 47 34 31 15 


Prepare a frequency table with class intervals 10 


average salary from the frequency table -20, 20 — 30, as soon and compute the 


obtained, 
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Solution: 


Class interval | 

— Tally bars | Frequency (f) ‘Mid value (X) x 

ae +H III 8 15 | 120 

iz . 7 tH lit | 88s 200 

-— - [Hh +H Ht is BS | 535 
L — 50 tH tHt | 

| ___— 50-60 tHE ls sa 

il ir | : 8 55 |} 440 


Arithmetic Mean (X) = = = 10 = 35.6 


Average salary = 35.6 x Rs.i00 = Rs.3560 


Example 2.39} Find the average marks of student from the following table: 


Marks obtained 0 and less than 10 | less than 20 | less than 30 | less than 40 | less than 50 
No. of Students 15 22 38 40 | 50 


sitios given frequency distribution is less than cumulative frequency distribution so, at first, it is 
required to convert it into normal (simple) frequency distribution. Construction of normal frequency 
table and computation of average as below: 


Marks [ No. of Students (f) | Mid value (X) | fX 
0-10 15 5 
10 — 20 1589 15 
20 — 30 38 —22=16 25 
30-40 40 —38 =2 35 
40-50 50-40 = 10 45 


N=50 


LfX = 1100 


= xX | 
Average Marks (X)= a = ee = 22 


Example 2.40| Find the arithmetic mean from the following data: | 


Expenditure (Rs '000') No. of people 
Above 0 
Above 100 
Above 200 
Above 300 
Above 400 
Above 500 
Above 600 


Solution: The given frequency table is more than cumulative frequency table. So, at first, we have to 
* . . ; : 
reconstruct the normal frequency distribution for the calculation of arithmetic mean 
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Computation of arithmetic mean 


Expenditure (Rs.'00000') Number of people (f) | Mid value (x) | fx | 
L a an oa 50 0.5 10 
— 30 1.5 4s 
Ae 50 2.5 12.5 
a | 20 3.5 | 70 
| 4-5 | 18 4.5 | BI | 
| 5-6 | 12 me = 
| 6-7 | 0 L 6.5 | 0 _| 
| l N= 150 |ZX = 284.5 
ee, eR 5s, Arn tener ene cn Scher 


— DfX 284.5 
Here, Arithmetic Mean (X) = ais By 1.89667 


Arithmetic Mean (X) = 1.89667 x Rs.1,00,000 = Rs.1,89,666.66 


[Example 2.41 | Mr. Kamal has invested his capital in three Banks namely Kist, Sanima and Everegt, 


Rs.60,000, Rs.70,000 and Rs.1,00,000 respectively. If he gets dividends of Rs.20,000 from each 
Bank, calculate his average rate of return from three Banks. 


Total return 


4 . -e—— Y) 
Solution: Average rate of return = Total anvesument © 100 


Investment of Mr. Kamal in Kist Bank = Rs.60,000 
Investment of Mr. Kamal in Sanima Bank = Rs.70,000 
Investment of Mr. Kamal in Everest Bank = Rs.1,00,000 
Total investment = Rs.(60,000 + 70,000 + 1,00,000) = Rs.2,30,000 
He gets dividends of Rs.20,000 from each Bank. 
So, Total return = 3 x Rs.20,000 = Rs.60,000 


_ _ Total return 60,000 x 100 
Average rate of return = Total investment © 100%= ~ 2,30,000 = 26.09% 


Example 2.42 | Goals scored by a football striker in 5 matches are 6, 4, 3, 0 and 1. What is the number of 
goals that the striker must score in successive 6" match in order that the average comes to 4 goals 


for match? 
Solution: Here, the goals scored by a football striker in 5 matches are 6, 4, 3, 0 and 1. 
Let no. of goals scored by him in 6" match be x. 


Average goals (7) = Total sum of goals 


No. of matches 


6+4+3+0+14x 
ai ad > x=10 


“. The required number of goals in 6" match is 10. 
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Solution: Let missing frequency corresponding to 30 be fy. 


x | ; rn: ae 
10 | 5 | 50 
20 | 8 | 160 
30 | $; 30f 
| 40 | 9 | 360 
50 | 7 | 350 
60 | | | 60 
Total | N=30+f, | YfX=980+430f; 
Here, Xf =30+f), Cfx=980 + 30f; 
a SEN 980 + 30f 
Now, we have, Y = 4£X = = 
xy > 32 304f, = f= 10 


Calculate the average income from the income distribution of 1400 workers of a factory. 


200 


Income in Rs. 
Below 1500 
1500 — 2000 
2000 — 2500 
2500 — 3000 
3000 — 3500 
3500 — 4000 

4000 and above 


Solution: Since only average is asked and we know that simple average represents the arithmetic mean. 
So, we have to calculate arithmetic mean for the given data. Since the data are given in 
continuous series and lower limit of the first class and upper limit of the last class are 
unknown. So we take 1000 and 4500 as lower limit of the first class and upper limit of the last 
class respectively fixing the class interval of size 500 for our convenience. Then 


Taking assumed mean (A) = 2750 and Common factor (h) = 500 


[ | Mid value )_ x= 2750 No.of | ; 
Income in Rs. | (X) i = 500 workers (f) L fd 
1000 — 1500 1250 a 200 — 600 
1500 — 2000 1750 -2 225 — 450 
2000 — 2500 2250 —] 215 —275 
2500 — 3000 2750 0 250 0 
3000 — 3500 3250 180 180 
3500 — 4000 2 150 300 
[ 4000 — 4500 3 120 


: Yfd - 48 
Arithmetic mean (X) = A+ _ h =2750 + mT x 500 = Rs. 2576.786 
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[Example 2.45 | The average of 100 observations is 72. Later, it was discovered that two observations 85 
and 63 were misread as 58 and 36. Find the correct average of 100 observations. 


Solution: Here, = 100, X=72 7 
Sum of observations (=x) =n x X = 100 x 72 = 7200 
Now, Correct EX = 7200-58 - 36+ 85 + 63 = 7254 


— Correct >X 7254 _ 
. Correct average (X) = eae ae T00 = 72.54 


Example 2.46] The following are the monthly salaries in US dollars of 20 workers of a firm. 


130 62 145 118 125 76 151 142 110 98 
65 116 100 103 71 85 80 122 132 95 


The firm gives bonuses US dollars 10, 15, 20, 25 and 30 for the workers in the respective 
salary groups exceeding US dollar 60 but not exceeding US dollar 80, exceeding US dollar 89 
but not exceeding US dollar 100 and so on. Find the average monthly bonus paid per worker, 


Solution: Construction of frequency table and computation of average as below: 


[Salary (in $)} Tally marks | frequency (f) | Bonus ($) (X)| fX 
60 — 80 HE 5 10 50 
80 — 100 HH 4 15 60 
100 — 120 at 4 20 80 
120 — 140 sale 4 25 100 

|_ 140-160 ne 3 30 90 

Total DfFX = 380 


.. Average monthly bonus paid per worker (X) ak = am = $19. 
[Example 2.47 | Calculate the average daily wages for the workers of two firms together. 
Firm A | Firm B 


100 200 
200 150 
Solution: Here, 


No. of workers in Firm A (ny) _ = 100, No of workers in firm B (ng) = 200 
Average daily wage in firm A (X14) = Rs.200, Average daily wage in firm A (Xs) = Rs.150 


Combined Mean (X,) = “4-tatne- Xp _ 100 x 200 + 200 x 150 
MatMp 100 + 200 = Rs. 166.67 


No. of workers 
Average daily wage (Rs) 


Solution: 


> 


Let, n; and n2 be the number of male and female employees respectively 


Here, we have, 25,000 = *27,000 + n, = 17,000 


Ny +N 


OF -25,000n2— 17,000n2 = 27,000, ~ 25,000, 
d 1 


Descriptive Statistics 51 


n, — 8000 n 4 
or, wae YS & 
m ~2000 7 71 


The ratio of male employee and the female employce = 4:] 
Let nm, = 4K and n)= K 


4K 4K 
Then percentage of male employee = aK +K x 100% = 5K * 100% = 80% 


and percentage of female employee = am x 100% = & x 100% = 20% 


4K+K 
xample 2.49} The result 100 students who secured less than 60% marks are given below: 
Marks obtained 0-20 | 20-40 | 40-60 
Number of students 16 24 30 


If the average mark of all students was 50, find out the average mark of those who secured 
more than 60% marks. 


Solution: At first, Calculation of arithmetic mean of given data 
[ Marks obtained | Mid zeae) | No. of students f) IX 
0-20 10 16 160 


VFX = 2380 


Let us assume that 
X= Average marks securing less than 60% and X,= Average marks securing more than 60% 


s ‘X 2380 - 
We know, ¥,-22_28 70 = 34. Given, Ni +N,= 100, N,= 100 — 70 = 30, X= 34, X=? 


Using combined mean formula, we get 


= MXit MX _ 10x34430xX |. _ 2380430 X, } 
n= NaN 100 50 = 100 => X,=87.33 


Therefore, average marks of the students who secured more than 60% = 87.33. 


Example 2.50 | A student obtained the following marks out of 100 in S.L.C Examination. English 65, 
Mathematics 90, Science 75 and Accountancy 95. Find the students weighted arithmetic mean if 


weights 1, 2, 3 & 4 respectively allocated to the subjects. 
Solution: Computation of weighted mean 


Obtain Marks (X) — (w) | wX | 


English 
Mathematics 


x 850 | 
Weighted arithmetic mean (X,) = | =10= 
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[Example 2.51| a) Three types of workers are employed in each of two factories, but at different rate of 
wages as follows: _—— eae ener eee 
Factory A Is __FactoryB 

f wages (Rs.) | No. of workers | Rate of wages | No. of workers 
7 300 20 i 
10 300 
8 275 


Semi-skilled 
Unskilled 
In which of the two fa 


Solution: For factory A _ re 


Types of worker | Rate of wages (X) |No. of workers (w) WX 
Skilled 350 7 2450 
Semi skilled 10 2500 
Unskilled 8 1600 ‘ 

dL wX = 6550 


| Average wage X,,(A) = tuk = oe = Rs.262 
For factory B 
Types of worker} Rate of wages (x) | No. of workers (w) wX 
Skilled 300 20 6000 
Semi-skilled 300 30 9000 
Unskilled 275 25 6875 


wx = 21875 


= DwX _ 21,875 
Average wage X,,(B) = a aa Rs.291.67 
Hence, the average wage in factory B is higher than factory A by Rs (291.67 — 262) = Rs.29.67 


b) The marks obtained by two candidates A and B for a scholarship test is given below where as the 
weights of various subjects were different. 


Subjects Weights | Marks obtained by 
L 
Statistics 4 63 
Accountancy 2 65 


If the performa f i i ‘teri : : 
: ele the candidates is the criteria for awarding scholarship, who should be awarded 


Solutions: Computation of Weighted Arithmetic Means 


Subjects Ever a te Candidate B mee 
Statistics a a’ = = — ee | i 
Accountancy 65 2 130 64 : o 
English 60 2 120 66 2 132 
GK Hence z 120 50 2 100 

Total Zw=10 | Ywx= 622 Tw=10 | Swx = 600 
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For candidate A, 
Average score of candidate A, ¥,,, (ix Zw 622 bss 
w LO: Se 
Average score of candidate B, Y,,(B) = aut _ m 380 
WwW 


Since X\ (A) > Xv (B), it indicates that A's performance is better than B's performance in the 
test. So, candidate A should be awarded with the scholarshin. 


[Example 2.524 From the following table, compute A.M. and G.M., hence verify that A.M. > G.M. 
Class Interval 


0-10 [10-20] 20-30| 30-40 | 4050 


Solution: 


X 370 
25% _ 390 So%495 and GM. =Antilog (eet = Antilog (A) = 19.86 


Xf log X = 20.775 


AM See 1G 


Hence, 4.M. > G.M. 


Example 2.53:] Calculate the harmonic mean from the following data. 


16 


X 10 | 20 30 | 40 | 50 
7 8 6 5 7 
Solution: Computation of harmonic mean 
7 1 i 
i 10 7 0.1 0.70 
20 8 0.05 0.40 
30 6 0.033 0.2 
40 5 0.025 0.125 
50 a 0.02 0.14 


l 
Lfy= 1.565 


N 33 
Harmonic Mean (H.M.) = — 7 =1.565— 21.08 


1 
oy 


A driver travels from Bhairahawa to Butwal at an average speed of 40km/hr, Butwal to 


Narayanghat at an average speed of 45 km/hr and Narayanghat to Hetauda at an average speed of 42 
km/hr. What is his average speed during the trip from Bhairahawa to Hetauda? 


Solution: Let x (km/hr) be the average speeds of travelling. 
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3 
Average Speed (H.M.) = —] = 9.071 ~ 42.25 


-. Average speed during the trip from Bhairahawa to Hetauda is 42.25 km/hr. 


Find Q;, D3 and P¢s from the given data: 8, 6, 5, 4, 10, 15, 3,16 


Solution: Here, the number of observation, i.e. 1 = 8 
First, the data are arranged in ascending order: 3, 4, 5, 6, 8, 10, 15, 16. 


th gti)". ; 
Q, = Value of (tat) item = Value of gor item = 2.25" item 


= 2 item +0.25 (3% —2™) item = 4+ 0.25 (5-4) = 4.25 


Z 3(8 +1)". 
Similarly, D; = Value of Garme item = Value of seer item = Value of 2.7" item 


= Value of 2" item + 0.7 3°-2™) item = 440.7 (5-4) =4.7 


65 (n+1)' 65(8 +1)". 
Po65 = Value of eer item = Value of eee item 


= Value of 5.85" item= Value of 5" item + 0.85 (6 item — 5" item) 
= 840.85 (10-8) =9.7 


[Example 2.56 | Find upper quartile and upper decile from the given data. Also obtain P77. 


Solution: Calculation of partition values 


x f Less than c.f. 
[ 1 3 2 
2 5 7 
3 8 15 
4 10 5 
5 12 37 
6 8 45 
7 6 51 
8 4 55 
9 3 58 
f : 
tens 1 Z 
N=61 | 


Aa N+1\th , 61+ 1\th 
For Q3: Q3= Value of (Met ) item = Value of a( mi ) item = Value of 46.5" item. 


The value in c.f. just greater than 46.5 is 51. So, upper quartile ie. Q;=7 
&, 3= 7. 


>. 


Rn nL TNC TC tte 


ar decile pry ‘ . N + | th : ( 
Upper decile (Dy) = Value of x( 10 ) item = Value of (Ot ') h bieni a alia oak hte x 
The value of c.f. just greater than 55.8 is 58. 


So upper decile i.e. Dy = 9, 
., {77 + Vth 3 th 
= Value ce eS ie 3x 78" 
Ror Po n= Value of x 100 ) items = Value of 100 ~ item = value of 47.74" item. 
The value of c.f. just greater than 47.74 is SI. 


“Py = 7 


[Example 2.57] The given data, calculate the numbe 
Wage in Rs.( less than) 10, 
Number of workers 


r of workers getting wage between 1“ and 3 quartile. 


Solution: Since, the given frequency distribution of income is in the less than cumulative frequency 
distribution form. So, it should be converted into ordinary (simple) frequency distribution. 


Calculation of Quartiles 


Wage in Rs. No. of workers ( fy] ro ie 
0-10 45 45 
10-20 85 130 
20 — 30 160 290 
30 — 40 i 365 

40 — 50 35 


4 ch 100 —45 


Now, ° Q =L+—> xh =10+—ge— * 100 =Rs.16.47 


For Q;: The position of Q3= * = ae = 300, the c.f just greater than 300 is 365. 


So, Q; lies in the class 30-40 and L= 30, c.f = 290, f =75 and h= 10 
3N 


of ! 
xh = 30 2 10 = Rs.31.33 


4 
Now, QO; = L+ 


The number of workers getting wage between Rs.16.47 and Rs.31.33 


20 - 16.67 3133-30 
=p * 85st 16+ 10 x 75 = 55.98 


Example 2.58 | The marks distribution of 150 students in a class test as follows: 


[Marks (less than) | 10 | 20 | 30 | 40 | 50 | 60 | 70 
No. of students 8 28 | 50 | 100 | 120 | 142 | 145 


If the campus has a policy to run a special class for the top 25% of the students what is the - 


lowest mark obtained by those top 25% of the students. 
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Solution: The lowest mark obtained by those top 25% of the students is given by P75 i.e. Q3 
Top 25% 


Marks 


Less than 10 
10-20 
20 — 30 
30-40 
40-50 

50-60 

60 — 70 


No. of students (f) eel 
28 -8 = 20 8 

SO — 26 = 22 28 
100 — 50 = 50 50 

120 — 100 = 20 100 

142— 120 = 22 120 

145 — 142 =3 142 


TaN 3 137 


The position of P, = = 102.75 


100~ 4 
The c.f. just greater than 102.75 is 120. So, Q; or P7; lies in class 40 — 50, 
where, L = 40, h = 10, f = 20, cf. = 100 


75N ef 
100° *: 102.75 — 100 
Now, Pis5 = a xh= 4g eS — 100 Be i x 10 = 41.375 


The lowest mark obtained by those top 25% students is 43.5. 


‘(Example 2.59 | From the following distribution of marks of 500 students of a campus, find the minimum 
pass mark if only 20% of the students had failed and also find the minimum marks obtained by the 


| 60 [50 | 


nts had passed. The minimum pass mark 


top 25% of the students. 


: 
Solution: If 20% students had failed, it means 80% stude 
obtained by the students is given by Pp. 


20% failed 


| 


80% passed 


Calculation of partition values 


|__No. of students 


50 

100 
150 
90 
60 
50 


20N 20x 500 
100~ 109 = 100. The c.f just greater than 100 is 150. 


So, P29 lies in class 20 — 40, L = 20, f = 100, c.f = 50,h = 20 
20N 


mC, 
100 ~ oF 100 — 50 


xh = 20+ 100 * 20=30 


The position of Py) = 


Now, Py =Lt+ 


. The required minimum pass mark = 30 
For the top 25%: 


The minimum marks obtained by top 25% of the students is given by P35. 
Top 25% 


“7 >» 

. 
—_ HE = 
——> 


P75= Q3=? 


For Py5: 
5a ISN 3x 500 

The position of P75 ei ae 
The c.f. just greater than 375 is 390. So, P75 lies in class 50-60. 
75N | i, 
100 375 — 300 

f xh = 50+ 90 
Therefore, the minimum marks obtained by the top 25% students = 58. 


Example 2.60 | The mark distribution of 104 students is given below. 


Central rank of group | 10 | 20 | 30 | 40 | 50 [ 60 | 70 | 
Number ofstudents | 7 | 8 | 13 | 29 | 35] 9 | 3 | 


Find the pass marks of 78 students passed the examinations. 


78 x 100 
Solution: Percentage of passed students = 94 = 75% ie. 25% students are failed. The pass marks (ie. 


minimum pass marks) of 78 students or 75% students passed is given by P>; for the given mark distribution. 
Here, the given distribution is in central value (i.e. mid-value) form. So, at first we have to construct 
the class intervals as below. 


Class size (h) = difference between two successive mid-values= 20-10 = 10 
h ; : 


2 


Now, P45 =[+ x10 = 58.33 =58 


Subtract 5 5 5 from the first middle value for lower limit of first class interval and add 5 to the same 


mid ae for the upper limit of first class interval and so on. Other class intervals are constructed in 
the similar fashion as shown in the calculation table below: 


Marks a No. of students (f) CJ, 
5-15 7 7 

15-25 8 15 
25 —35 13 28 
35 —45 29 57 


45-55 3D 92 
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25N 25% 104 


For P25: Now, Position of P25 is given by Foq =" 90 = 26. 


The c.f. just greater than 26 is 28. So P25 lies in class 25 — 35. 
L=25,f = 13,c.f.=15,4=10 


100 ~¢ 26 —15 


Now, Pos = ea = 25+ 13 x 10= 33.46 


The required pass mark = 33.46 


xample 2.61 | The table given below represents the daily wage distribution of 130 workers. Find out the 
range of income of the middle 60% workers. 


Wage (Rs.|_ More More More More 
per week) | than 70 | than 85 | than 100 | than 115 | than 130 
No. of 130 


workers ae 


Solution: The limits of income of the middle 60% workers are given by P29 and Po 
Middle 60% 


wots, 


More | More More | More” 
than 145 | than 160 | than 175 


6 | | s 


79 


Wage (Rs. per week) _ 
70 — 85 


—_ “122-8 


85 — 100 122 - 109 = 13 21 
100-115 109-79 = 30 51 
115-130 79 44 =35 86 
130 — 145 44-26 =18 104 
145-160 26-14=12 116 
160 — 175 14~5=9 125 


More than 175 5 130 


20N _ 20x 130 
100 ~ 100. = 26. The c.f. just greater than 26 is 51. So, 
P29 lies in class interval (100 — | 15). Then, we have, 


L=100, f = 30,cf=21,h=15 


For P29: The position of Py = 


20N f 
Po = 4 _ 26-21 
xh =100+ 30. * 15 = Rs.102.50 


For Ps: The positi 80N 
Position of Py =~ 100 = 104. The c.f. just equal to 104. So, Po lies in 130 - 145 


xh = 130+ 


86 
: x 15 = 145 
The range of income of the middle 60% Workers = 


104 — 
] 


. Peo Px = 145 — 102.50 = Rs.42.5. 


Th 
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Example a . canes distribution of marks of 100 students is given below. Frequencies 
corresponding to two groups are missing from the table. The median is known to be 49.5 marks. 


, 
\Marks 0-19 20-39 40-59 60-79 80—99 
|Number of students | 14 3 2% |? 16 


i. Find the missing frequencies. 
ii. Calculate the limits of marks obtained by middle 60% students. 
iii. Highest marks of lower 25% students iv. Lowest marks of higher 75% students. 


Solution: The given frequency distribution of marks of the students is inclusive type so it is required to 


GanveEt 36 EEO exclusive type before computing median and partition values. This is carried out by 
determining the correction factor (C)) = 4 (20 — 19) = 0.5 


Resonstevction of the frequency distribution and computation of partition values 
| Marks __No. of students (f) | cy ~ 
| —0.5 — 19.5 14 14 
19.5 — 39.5 | a (suppose) | l4+a 
| 39.5-59.5 | 26 | 462% 
| SS TLS | b (suppose) | 40+a+b 
| N=56+a+b | 


Let, a and b be the missing frequencies of the class intervals (19.5 — 39.50) and (59.5 — 70.5) 
respectively. 


i) Since median marks is 49.5 which lies in the class (39.5 — 59.5), so median class = (39.5 — 59.5) 


N 
ae 50—(14 +a) 
Median (M,) = L + f xh => 49.5=395+ %6 x20 => a=23 
and N = 56+a+b => 6=21 
The cumulative frequency distribution becomes 
Marks No. of students (f) | es. 
=05—i195 | 14 | 14 
19.5 — 39.5 23 a7 
39.5 —39.5 26 63 
59.5 —79.5 21 84 
;. $9.5 995 16 100. 


ii) The limits of income of the middle 60% workers are given by P2» and Pgo 
, Middle 60% 


oc 


Pro ae 20N 20x 100 , 
For Py: The position of P29 is given by 4007 jon 20. The c.f. just greater than 20 is 


37. So, Plies in class interval (19.5 — 39.5). Then, we have, 
L=195,f = 23,cf,=14,h=20 
20N 


100 ~ ° 20-14 
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80N 80x 100 _ 2 
For Peo: The position of Po =yo9 =~ 100 


| | 60 
59,5 ~ 79.5). 


The c.f, just greater than 80 is 84. So, Pao lies in class ( 
L=59.5,f = 21,cf=63,h4= 20 


| Po = Lt — xh 2595+ x 20: 75.69 
80 — 
So. the limits of marks obtained by middle 60% students are 24.72. and 75.69, ie. 75.69 
24.72 = 45.69 . _ _ 
iii. Highest marks of lower 25% students is 25" percentile (25) of the marks distribution of the 

students. 

For Ps, 

Lower 25% 
Pos 


= 25N 25x 100 
The position of P25 = 100 = 100.7 25 
The c.f. just greater than 25 is 37. So, Pos lies in class (19.5 — 39.5). 
L=19.5,f = 23,c¢f/=14,h=20 
25N 


100 ~ °F 25-14 
Pr = nr Te = 19.5 + 33 x 20 = 64.7 


iv. Lowest marks of higher 25% students is 75" percentile (Ps) of the marks distribution of the students. 


For Ps, Top 25% 


--en 
- . 
° x 


Pos 


The c.f. just greater than 25 is 84. So, the c.f; just greater than 75 is 84. So, P75 lies in class 


(59.5 — 79.5). 
L=59.5,f = 21,cf=63,h=20 
15.N f 
100 ¢v am 
Py5 = Lt+ res = 595+ 2-8 x 20 = 70.9 


“. The lowest marks of higher 25% students is 70.9. 


Example 2.63 | Income of employees in an industrial concern is given below. Every employee belonging 


to top 24% of the earners is required to pay 2% of his income to workers welfare fund. Find the 
average contribution to this fund by the workers. 


50-100 | 100-150 
150 


150-200 | 200-250 | 250-300 


Solution: Here, N = 500 
The number of workers belonging to the top 24% of the earners = 24% of 500 = 120 


To find the lowest j 
income of the top 24% earners, we have to 
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find Px 


___ Calculation of P,, 
Income (Rs.) _|__ Frequency (f) car , E 

0-50 oo a 
my 150 240 
100 ~ 150 100 340 
150 — 200 80 ion 
200 — 250 0 see 
250 — 300 10 ae 


76N 76x 500 
100 


TED SONONE 6.25 given by 100 — = 380. From cf table, the c.f. just greater than 380 


is 420 which correspond to the class interval (150 — 200). Thus P75 lies in the class (150 — 200). 
Then, we have 


L=150,h = 50, c.f. =340, f =80 
76N # 
“100 ~ ce: _ 
Px = ee ee xh = 150 + 59 = 175 


Thus, the class of income of the top 24% workers is Rs. (175 — 300). 
Now, we have to find average contribution of the top 24% earners 
| Income (Rs.) Mid value (x) 
175 — 200 187.5 


fx 
7500 


200 — 250 225 15750 
250 — 300 10 275 2750 
Total N= 120 dfx = 26,000 


Hence, the total income of the total income of the top 24% earners is Rs. 26,000 and the total 
contribution to the welfare fund = 2% of Rs.26,000 = Rs.520 


0 
Average contribution per person to the welfare fund = 759 = Rs.4.33 


Example 2.64 | The following table gives the frequency distribution of the marks of 200 examinees in an 
examination out of 100 marks: 


Solution: The frequency distribution 


Marks Below 20 | 20-40] 40 — 60 | 60 — 80| 80 and above 
No. of examinees 20 35 80 40 25 


i. Find the median and interpret the result. 

Find the number of examinees who obtain marks more than average. 

Find the number of examinees who get the marks below 45. . 

If the minimum marks to pass the examination is 32, what percentage of examinees pass 


the examination? 

It is assumed that 75% mad 
minimum pass marks for the examination. | 
consists of open-end class intervals. It is not required its amendment 
dian but for the calculation of average (Mean), its amendment is 
r limit of the first class and upper limit of last class are chosen 


of the examinees is supposed to pass the examination, find the 


for the calculation of me 
necessary. Thus the lowe 
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il. 


ili. 


according to the sizes of other class intervals of the distribution. i.e. lower limit of first Class jg 
0 and upper limit of the last class is 100. 


Calculation of Average and Median 


Class interval | Frequency (f) | mid value (X)| f-X |. less than cf. ; 
0-20 | 20 10 200 20 
20-40 35 30 1050 55 
40 - 60 80 50 4000 135 
60 — 80 40 70 2800 175 
80 — 100 25 90 2250 200 
Total N= 200 >fxX = 10300 


0 
Position of Median is given by ae = Col = 100. From c.f. table, the cf. just greater 


than 100 is 135. Thus Median lies on the class (40 — 60). 
2. A005 
f xh =40+ 80 


The value of median 51.25 means 50% of the examinees obtain the marks less than 51.25 and 
the other 50% of the examinees obtain the marks more than 51.25. 


Median (M,) = L + x 20 = 51.25 


2 0 , 
Average Marks (XY) = 2ix = ue =51.5 


Here we assume that the frequencies are uniformly distributed. We see that 51.5 lie in the 
interval (40 - 60) 


Thus, the number of examinees obtaining marks more than average = (S35) x 80+ 40+ 


25 =99 


The number of examinees obtaining the marks below 45 = A5%) x 80+ 20+ 35 =75 


Here, Pass marks = 32 


The number of examinees obtaining the marks above a2 3) x 35+ 804 40425 = 159 


; ~~ 159 
We see that, 159 examinees pass the examination, i.e. 200 x 100% = 79.5% of the examinees 


pass the examination. 


It is assumed that 75% of the examinees is supposed to Ee . 
ass th % of 
200 = 0.75 200 = 150. It means 15 Pp € examination. That is 75% 


pase’ maskes 0 examinees should obtain the marks above the minimum 


Let x be the minimum pass marks. Then 
- 40 — 
“20 ~35+804+404+25=150 =» aa Rass 


1300 . 
ee ae = S743 (Approximately) 


The minimum pass marks of th 


€ examination j 
supposed to pass the examination. SNS S143 


if 75% of the examinees is 
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of the data. However, i Ne 
mass » It may not be true if the values of the items highly scattered or spread. In other 
words, averages or measures of central 


tendency give us the ide i i 
. a of concentration of the items 
(observations) about the central part of the distribution data. But measures of central tendency do not give 


the information or characteristics of the distribution of how spread or scatter the data around the average 
value. Two or ov than two distributions having same average may differ in the siatterediese or variability 
of the CDSE rasOns from the central value (averages). Thus measure of dispersion has been discussed. 
Measure of dispersion provides the information regarding the amount of variability or inequality or 
deviation or scatteredness of the data from the average (central value). The study of scatteredness or 
variation of the data from the averages of a distribution is called the study of 'measures of dispersion’. 


The study of ecm 1S not enough to analyze the nature of a given frequency distribution. A 
further analysis of the distribution is necessary if we are to know how representative the average is. 


For example, the following are the marks obtained by two students in 5 different tests. 
a v[V 
[suits [0 | wo | | | | 0 | «0 


Here the mean and median marks of both students A and B are same. The difference of the marks 
from the average marks of B is more than that of the difference of the marks from the average marks of A. 


This means A's performance is more consistent than that of B. Therefore the measure of central 
value (average) alone does not present the whole characteristics of the distribution of data unless the 
variation or the dispersion of the individual values is considered. Hence, a measure of dispersion or 
variation is another important aspect of statistical analysis. 

Thus, only the measures of central tendency are inadequate to describe (or characterize) the 
distribution perfectly and sufficiently. Therefore, the measures of central tendency must be supported and 
supplemented by some other measures. One of such measure is dispersion or variability. The definitions 
of the prominent statisticians about the dispersion are given below: 

"Dispersion or spread is the degree of the scatter or variation of the variables about a central value." 

-B.C. Brooks & W.F.L. Dick 

"Dispersion is a measure of the extent to which the individual items vary," - L.R. Connor 

"The degree to which numerical data tend to spread about an average value is called the variation 

or dispersion of the data.” . - Spiegel 

Thus dispersion measure variability of statistical data from central value mean, median and mode. 

The Objectives of Measures of Dispersion are as Follows: 


i) To find out the reliability of an average. | 
ii) To control the variation of the data from central value of the distribution: . 

To provide the comparison of variability of two or more than two distributions. 

To facilitate the use of other statistical measures for further analysis of data as regression and 
ysis for further analysis of distribution. 


iii) 
iv) 

correlation anal 
Vv) To help in devicing a system of quality control. 
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Characteristics of Good (Ideal) Measures of Dispersion (Requisites of an Ideal/Good Measure at 
Dispersion) 

The characteristics of good (ideal) measures of dispersion are as follows: 

1) ‘It should be rigidly defined. 

ii) It should be easy to calculate and understand. 

ili) It should be based on all observations. 

iv) It should be amenable and suitable for further mathematical treatment. 

v) It should be least affected by fluctuations of sampling. 


vi) It should not be affected much by extreme values of the distribution. 
2.11.1 Absolute and Relative Measures of Dispersion 


Absolute Measures of Dispersion 


The measures of dispersion which are dependent to original units of measurement of data are said to 
be absolute measures of dispersion. The absolute measures of dispersion can be used only for comparing 
the variability or dispersion of two or more distributions having same unit or scale. If the distributions 
are given in different units or scales, then for comparison, the absolute measures of dispersion cannot be 
used. 

For example, Income x: Rs. 800 


Income y: Rs. 300 
Absolute measure of dispersion = Rs. 800 — Rs. 300 = Rs. 500 


Relative Measures of Dispersion 


The measures of dispersion which are pure numbers, independent of units of measurement of the 
data are said to be relative measures of dispersion. Thus, the relative measures of dispersion will not have 
any unit of measurement and are obtained by taking a ratio or percentage of an absolute measure of 
dispersion to a suitable average. i.e. the ratios of two absolute values. 


; Rs. 800 8 
Relative value = Rs. 300 73 


Therefore, for comparing the variability of two or more than two distributions even if they att 
measured in the different units and scales, for convenience results, t 


' ; he relative measures of dispersion 
instead of the absolute measures of dispersion are computed. 


The example of absolute measure of dispersion and relative measure of dispersion are listed as, 
Absolute Measure 


Relative Measure 
1. Coefficient of range 
2. Coefficient of quartile deviation. 
3. Coefficient of mean deviation. 
4. Coefficient of Standard deviation 
S. Coefficient of variation 


. Range 


2. Quartile deviation or Semi-inter- 
quartile range 

3. Mean deviation or Average deviation 

4. Standard deviation 

. Variation 


2.11.2 Types of Measurement of Dispersion 


l. 


Au rwn 


Among the above measure of dis 
and standard deviation is the best. 


Range 
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(a) Range 

Quartile Deviation (Semi-inte 
Average Deviation 

Standard Deviation 

a) Variance ( 
(a) ance (b) 


(b) Coefficient of Range 
r quartile Range) 


Co-efficient of Variation 
Lorenz curve 


Persions range is the simplest, average (mean) deviation is better 


diff eyes method of studying and measuring the dispersion is range. It is defined as the 
iffer €n extreme values of the distribution. In other words, the difference between largest 


a a and smallest (minimum) item/observation of the distribution is range. It is 


Range (R)=L-—S 


7 In case of continuous series of the data, the range is obtained as the difference between the upper 
limit of the highest class and the lower limit of lowest class. 


Range is absolute measure of dispersion. So the relative measure of dispersion corresponding to the 
range to compare two distributions is known as co-efficient of range which is defined by 


L-S 
Co-efficient of range = L+S 


Where, L is the largest item and S is the smallest item. 


Merits and Demerits of Range 
Merits of Range are as follows: 


i) 
ii) 


ili) 


v1) 


It is rigidly defined. 

It is simple to understand and easy to calculate. 

Its computation is very fast. Hence, if we want to know a quick rather than a very accurate 
picture of variability, we may compute range. 

It is mostly used in calculating the range of meteorological data such as data related to 
temperature, rainfall etc. 

Range is used in industry for statistical quality control of manufactured product by the 
construction of R-chart i.e. control chart for range. 

Range is also useful in studying the variations in the prices of stocks and shares and other 
commodities that are sensitive to price changes from one period to another. 


Demerits of range are as follows: 


It is not based on all the observations. That is, it is only considers the largest and smallest 
values of the distribution. 

It cannot be calculated in case of frequency distribution with open end classes. 

It is much affected by fluctuation of sampling. 

It is affected by extreme observations. 

It is not suitable for further mathematical treatments. 


In the words of W.1. King "Jt is too indefinite to be used as a practical measure of dispersion". 
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Despite some demerits, the range is useful and applicable in various fields like the stock market, 
fluctuation, the variations in money rates and rate of exchange. The uses of range are 

i) It is used in industry for the quality control of the product. . 

ii) It is the most widely used in our daily life as probable limits in the form of range, 


iii) It is also used by meteorological department of weather forecasts. 


2.11.3 Quartile Deviation/Semi-inter-Quartile Range 

Quartile deviation is the measure of dispersion based on the partition values (quartiles) of the data, 
The difference between third quartile (Upper quartile) and first quartile (Lower quartile) of the data is 
called inter-quartile range. 

Inter-quartile range = Q; — Q). 

Quartile deviation is half of inter-quartile range and thus is called semi-inter-quartile range. It is 
denoted by Q.D. 


Quartile Deviation (Q.D.) = ; (Q3-Q)) 


ject 
It is also expressed as OD. =5 [(Q3 — Mz) + (Mz- O,)] 


So, quartile deviation gives the average of deviations of the quartiles taken from median of the data. 
Since, in a distribution, 25% of the observations lie below Q, and 25% observations lie above Q;, so 50% 
of the observations lie between Q, and Q3. Therefore, for a symmetrical distribution, M, + O.D. covers 
exactly 50% of the observations. 


Quartile deviation is absolute measure of dispersion. So relative measure of dispersion 
corresponding to the Q.D. is co-efficient of Q.D. and is defined as 


Co-efficient of O_D. = aT 


Note: Quartile deviation is the most suitable for the frequency distribution of the data having open end 
class intervals. 


Merits and Demerits of Quartile Deviation 
Merits: 
i) It is quite easy to calculate and understand. 
li) It is better measure of dispersion than range. 
iii) It is not affected by extreme observations. 


iv) It is the only measure of dispersion wh 
distribution with open-end classes. 


v) It is very useful when it is desired to know variabili 
Demerits: 


ich can be calculated even when the frequency 
ty in the central half part of the data. 


i) It is not based on all observations Because it j 
ignores 25% 7 
of the data at the upper end of the distribution, nae coteatthe saa 


ii) It is affected by fluctuations of sampling. 
iii) It is not suitable for further mathematical treatment 


aa Dicks ats 
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Range and Quartile deviatio 


N are not the better 
F : ette a6 
included in both cases and they do oF Measu 


re of dispersion ¢ i 
not show the y p n as all the items are not 


Thus, they completely res aie coronene ion of the items or observations from an average, 
S N€ distribution. So. t 
‘90, lO Overcome both these drawbacks, 


e other measure of dispersion is devel... Naar 

th as w On ts developed, which 1s known as mean deviation or » deviati 

mean absolute deviation, Generally, it is denoted by M.D , SESE TS 
Mean deviation is also called average deviation 

(arithmetic mean) of the positive deviations (differences) 

(Mean or Median or Mode). 


Mean deviation is defined as the average 
of the items taken from either of averages 


Merits and Demerits of Average Deviation (M.D.) 
Merits of M.D. are as follows: 


i) It is easy to calculate and understand, 


li) It is based on all the items and is thus better measure of dispersion than the range and Q.D. 
lil) It is less affected by the extreme observations in comparison to standard deviation. 


iv) It provides a better measure for comparison about the formation of different distributions. 
Demerits of M.D. are as follows: 


i) It ignores the negative signs. 
li) Incase of skewed frequency distribution, the mean deviation from mode is not satisfactory. 
iii) It cannot be computed for the distributions with open-end classes. 


vi) Itis affected by fluctuations of sampling. 


2.11.5 Standard Deviation 


Standard deviation is defined as the positive square root of average of the squares of the deviations 
of the items from their arithmetic mean. It is also termed as the root mean squared deviation from mean of 
the data. It was first suggested by Karl Pearson in 1893. It is usually denoted by oO (small sigma) of Greek 
Alphabet. Since it satisfies most of the requisites of a good measure of dispersion, so it is regarded as the 
best measure of dispersion (or ideal measure of dispersion). It is more powerful and most widely used 


measure of dispersion than others. 


Computation of Standard Deviation } 
Standard deviation is computed by using the following methods. 


Short-cut method Step Deviation Method 


(= 


n 


Where, d=(X-A) or, d=M- A and A = Assumed Mean, 


q'=-—— A h = size of class interval (or common factor/multiplier) 
= h : ae, 
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Standard deviation is absolute measure of dispersion. Thus, the relative measure of dispersion 
corresponding to standard deviation is co-efficient of standard deviation is defined as 
: S.D(G 
Co-efficient of S.D. = ¥ 


By definition of standard deviation, the above formula is derived as below: 


ax LE penn [2X _(ZX) 
Individual Series: S.D. (6) = 2K AY = PE = n = n 


Die. es Lf? L{X iy 

Discrete Series: S.D. (6) =\ ae WV —(XY= ‘Ni = NJ 
[Spm — xp. we LfM@ (XfM\ 

Continuous Series: S.0.(a) = “ it =(AP= N = N ) 


where M = Mid value of the class interval. 


Merits and Demerits of Standard Deviation 
Merits of standard deviation are as follows: 
i) It is rigidly defined. 
ii) It is based on all the observations. 
iii) It is affected as little as possible by fluctuations of sampling. 
iv) It is suitable for further mathematical treatment. 
Vv) It satisfies almost all requisites of good measure of dispersion. 
vi) It is the best and most powerful measure of dispersion. 
Demerits of standard deviation are as follows: 
i) It is quite difficult to compute. . 
li) It gives greater weights to extreme values & less weight to those values which are close to 
mean. 
iii) It cannot be calculated for the distribution with opened classes. 
iv) It is not comprehensible for a layman. 


Example 2.65 | Find standard deviation of the following series by using 


(a) Actual mean method. (b) Direct method 
(c) Short cut method (or assumed mean method) 
X: 4, 6, 8, 14, 18 
Solution: a) Actual mean method 


b) Direct method 
iat 


c) 
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Now, Mean (X) = me = 10, 


Standard Deviation (o) = 


}+—— x Fes (oes = xX 1 
: 16 1 
, 36 
14 64 
8 | 196 
—— 324 


Standard Deviation (o) = 


Short cut method 
Let assumed mean (A) = 8 


Standard Deviation (6) = 


Example 2.66 | Find the S.D. of the following data. 


12, 13, 15, 16, 18 


off 
és Ns * 136241279 = 5.9) 


————— 


What will be the value of S.D. if each item is increased by 2? Also what will be the value of 


S.D. if each element is multiplied by 2? 


Solution: Case I: Let assumed mean of the data be 15. i.e. A= 15 


Calculation of the S.D. of the given data: 


S.D. (6) = 


Case II: If each item is increased by 
Let assumed mean of the data be 17. i.e. A=17. 


2. The original data becomes 14, 15, 17, 18, 20 
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Calculation of the S.D. of the given data: 


Variable (X) | d=X-17 ad | 
A 9 
5) 4 
0 0 
3 9 
Sa = 23 


/23_( 
Standard Deviation (0) = i - z. el So 


Case III: If each element is multiplied by 2, the original data becomes 24, 26, 30, 32, 36 
Let assumed mean of the data be 30 and common factor be 2 i.e. A = 30, h = 2, Then d'= a4 


a 


Calculation of S.D. of the data: 


SDio) =\ [=- si xnav] > (2) x2 =4.28 


Conclusion: S.D. is affected by change of scale but not by change of origin 
Example 2.67 | Find S.D. of the following data. 


Variable (X) 14 20 
ps ee a 


Solution: Let assumed mean of the data be 16 and common factor (multiplier) be 2. i.e. A = 16 andh =? 
Then, we have 


V. 


ee a 
nr iy Descriptive Statistics 71 
xample 2.68 } Find out the standard deviation from the follow; - 
‘follow; 


Wages rT ee see eects ng distribution (using step deviation method). 
No. of workers [10 ia | 40-60 : 60 - uy 80 - 100 
ds ge 


g 5 


Solution: Let assumed mean be 50 and size o Fol 


48s Interval be 20. i.e. A = 50) and h = 20, then ud =e A 
H 


Computation of standa 


A rd deviation of the dat: 

Wages No. of workers | Mid Value ie Y - ieee 
Pf wt s 

0-20 10 Ce ire rae lame 7; 40 

5) 12 30 <i ~12 12 

40 - 60 15 | 50 0 : 

60 - 80 8 70 | ; 

80 - 100 = 5 90 2 10 20 

N= f = 50 | Sfd’'=-14 | Sfd?= 80 


Standard Deviation (o) = af | Gal ee Aye G3) x 20 = 33.98 
25 25 — 


2.11.6 Variance 


The square of standard deviation is known as variance. It is denoted by Oo or Uy. Variance is 
computed by Squaring standard deviation. It is clear that variance is always positive and measured in 
terms of square units of given data. The concept of variance is very much useful in an advanced statistical 
work and has very important applications in inferential statistics. 

The relative measure of dispersion corresponding to the variance is co-efficient of variation which is 
Ao 
xX 

C.V. is the most widely used relative measure of dispersion in comparing two or more than two 
distributions. In comparison between two or more distributions, the distribution with lower C.V. is 
supposed to be more homogeneous or more consistent or more uniform or more regular or more equitable 
or more representative or more stable less variable 

According to Karl Pearson, coefficient of variation is the "percentage variation in the mean". It is a 
relative measure of dispersion, so it is independent of units of measurement. It is always expressed in 
percentage. Therefore, C.V. can be used to compare two or more than two distributions with regard to 
their variability, consistency, uniformity, homogeneity, equitability, stability etc. 

Coefficient of variation is applicable for the comparison of variability of two or more than two 
distributions (series) as follows 
Less C.V. is considered as 
More consistent 
More homogeneous 
(less heterogeneous 
More uniform 
More stable 
More representative to mean 


usually denoted by C.V. and is given by C.V.= 


More C.V. is considered as 
Less consistent 

Less homogeneous 

(more heterogeneous) 

Less uniform | 


Less stable 
annie 
Less representative to mean 


Less equitable 


More uae More variable ss 
é More disparity 
ss dis pan More regular — 
“se ee f dispersion is 4S.D. = 5M.D, = 6Q.D. and Range = 68.D. 


Note: The relation between different measures 0 
(Approximately). 
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Properties of Standard Deviation _ 
Property 1: Standard deviation is the least possible value of root mean square deviation. 
OR 
In other words, the root mean square deviation is not less than the standard deviation 
OR 
Mean square deviation is not less than the variance. _ 
Property 2: Standard deviation is independent of change of origin but not of scale. 
Property 3: For identical observation, standard deviation 1s Zero. 7 
Property 4: For any discrete distribution standard deviation is not less than mean deviation from 


mean. 


i.e., \ a Epa XP > wu Ef \x- X| 


Property 5.Combined standard deviation 
Let X, and X, be the means of first and second series, 6,” and o>” be the variances of first anj 
second series with n, and n, number of observations respectively. Then the combined standar4 


deviation of these two series taken together is given by 


ees nO," + 1202" + md + nody. 
a ny +n 


Where, ad, =X,-Xv, dy = X)- Xp 
—  mX\+mX, 
ca alae 
natn 


Similarly, the combined standard for three series is given by 


Pare no + N20 +303" + nd" + nods + n3d3 
ial nytn+n; 


Where, a =X) =X 

{ . dy = X,- Xp; 

: d; =X3-Xin 

lm = nX, + mX2+ 13X3 
ny + n+ n3 


Example 2.69 | Calculate the combined mean and standard deviation from the following information 


Po Factory [Factory B 


No.of woikes 100 [500 J 
Daily Mean wage (Rs. | 5060 S*d 
Solution: Here, N, = 100, N, = 500, X, = 50, X2 = 60, 0, = 10 and ett 


Standard deviation 


Now, 
' =, _NXi+N2X, 100x 5045 
Combined Mean (Xj) =—1=“usta _ S00 x 50 + 500 x 60 
Ma Ce 100+500 = Rs. 58.3. 
And dy = X\- X= 50-583 =-83 


d, = X,- X= 60—583 = Ne 
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Combined SD. (6,3) = 
= LOL" C83 Soo es 
100 + 500 = 11.46 
xample 2.70} An analysis of monthly wages paid t 7 , . 
Scene idustay wives following ne CN the workers in two factories M and N belonging to 
un No. of workers: 
Average monthly wage (Rs): 
Variance of distribution of wage: 
i) Which factory M or Nhas a larger wage bill? 
ss ii) In which factory M or N is there more uniformity in distribution of wages? 
“d Solution: For Factory M 


Here, n = 600, Xn = 182, 07y= 78 ie. Ow =V78 = 8.83 
Total wage bill = DX =n. Xy= 600 x 182 =Rs.1,09,200 


Co-efficient of variance (C.V,) = cs x 100= 58 x 100 = 4.85% 


For Factory N 
Here, n = 700, Xn = 178.50, i.e. oyv= J98 = 9.89 


Total wage bill = 1X, =n - Xv = 700 x 178.50 =Rs. 1,24,950 


‘ Oy _9.89 
Co-efficient of variance (C. Vy) = ra = 17850 
N ; 


x 100 = 5.54% 


i) Since }Xy > LXy, so, factory N has a larger wage bill by Rs. (1,24,950 — 1,09,200) = 


Rs.15,750 
ii) Since CV < C.Vy, so there is more uniformity in distribution of wages in factory M. 


Example 2.71 | Compute appropriate measures of dispersion from the following table. 


mE 
Hs D7 > ep ls | 2 


Frequency 5 
Solution: Since the given frequency distribution has open end classes. So measure of dispersion based on 


partition values (quartiles) is appropriate i.e. Quartile Deviation is appropriate. 
Here. we construct the continuous frequency distribution as shown below and calculate the 


Q.D. 
Frequency (f) 


Below 10 
10 — 20 
20 — 30 
30 — 40 
40 - 50 

50 and Above 


~ § 
| 8 
7 
12 
28 
20 
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80 
Position of Q, is given by v4 = 20. From c,f. table, the c.f. just greater than 20 js 32 Thus 


Q, lies in the class interval (30 - 40). 


4 — 20 


3N 80 
And position of Q; is given by = 3xQ= 60. Fromc.f. table, the c.f just greater than 69 is 


80. Thus Q lies in the class interval (50 - 60). 


an cf 
xh=50+ ost x 10 =50 


= 50-30 
Therefore, Quartile Deviation (Q.D.) = 2-2 10 


Now, OQ; =L+ 


Example 2.72 | The following are the runs by two cricketers in 10 matches. 


[maces [1 [2]3]*[5]6|7]#] 9] 0] 
If the consistency of performance is the criterion for selecting in National team, which 
cricketer should be selected? 


Solution: To test the consistency of performance, the C.V. should be calculated. 


For cricketer A, Assumed Mean (A;) = 30,n, = 10 


Mean (X,) =A, + 2d, a = 30470 = 30.7 


Standard deviation (0,4) = = EB 
“) 


_ 4d 
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hus For cricketer B Assumed Me an (A, 2) = 30.7 10 eRe ae emer! 


Xx; _ ay: =X, Ay | i 
P j 6A 
. | ats 
4 | 16 
3 9 
~2 | 4 | 
) is 0 0 
4 | 16 
6 | 36 
) | af 
~ _| 100 
Ld=5 | ¥d,? = 345 | 


2 2 
Standard deviation (o,) = Xd’, (Xa; a4 [345 _ (52 _ 8 
ny Nn 10 5.85 


o 5.85 
CVs <<. 100% = 57-5 x 100 = 19.18% 


Since, C.V, < C.Vs. So performance of A is more consistent than performance of B. Therefore, 
cricketer A should be selected. 


A sample of 500 cars of each of two makes X and Y is taken and average running life in 
years is recorded. 


- No. of cars 


Life (no. of years) 


If prices of car are same, which makes car should be preferred by the buyer? 
Solution: To decide, which make should be preferred by the buyer. We should calculate C.V. of each make. 


For make X 
Calculation of Mean and Standard Deviation 


T_'| xfd=240 | Sf =2640 


Here, Assumed Mean (A) = 5, N= 500 


——-“_ 
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~ Yfd —240 7 
Mean (X) =A le. +00 - 5-048 =4.52 


oe [> Cla yfd\2 2640 (y= 
Standard deviation (0,) = ale ale = 500 > 500. 2.247 


vey . Oy . 2247 ! 
C.Vy = X, > 4.52 x 100% = 49.71% 


For Make Y 


Life (no. Mid f fd fa 
value (™m) 
—240 960 
—200 400 
0 0 
240 480 
80 320 
Yfd=-120 | 2Xfa =2160 


Here, Assumed Mean (A) = 5, N= 500 
aa d —120 
Mean (Xv) =A +24 5+ G3) 


N 
2 =5-0.24=4.76 
& 2 
Standard deviation (Gy) = zie (ie 


2160/1202 
=\} 500 - (3) = 2.06 ° 


2.06 
C¥y=— x 100% = S72 x 100% = 43.28% 
Xy : 
Here, we see that C.Vy < C.Vy, so makes Y is more consistent in life duration than that of X. 
Thus makes Y is preferred by the buyer. 


[Example 2.74 | An association doing charity work decided to give old age pension to people of 60 years 
and above in age. The scales of pension are fixed as follows: 


Age group Amount of monthly pension (in Rs.) 


60 — 65 
65 — 70 
70-75 
75 — 80 
80 — 85 
85 and above 


tion: First we have to construct frequency distrib slione Descriptive Statistics 77 


; Pension j 
Age group (in yrs) ton inTally 
Rs. (X) bars 
60 — 65 


solu 


= 400 
me at 
70-75 [| 6 
a ae ae ae 
85 and above La 2400 1920000 


ia li i ae a 20,300 = Rs.676.67 


And standard deviation(o) = « /2£_ (ey 14750000 z (20300) hee 
Following are the marks obtained by two students A and B in 10 test of 100 marks each. 

ett 1 11213[415|6|7[8]9 [00] 
[Marks of 4 | 54 | 58 | 62 | 64 | 66 | 70 | 78 | 82 | 86 | 90 | 
[Marks of B___| 58 | 61 | 64 | 67 | 70 | 73 | 76 [ 79 | 82 | 85 | 
(a) Who is better? (b) Who is intelligent? 

(c) Ifthe consistency of performance is the criteria for awarding a prize. Who should get the prize? 
Solution: Arrange the marks of A and B according to ascending order 


For student A For student B 
d=X-70 d=X-70| @ 
—16 256 58 


15 
_ xd 10 ae nk Xd _ —=71.5. 


Since, X,< Xp, therefore, B is better. 


(b) For student A a | 
M,(A) = value of (ext) items = value of OP) item= value of (5.5)" item 


66 +.70 _ 
- MAA) = an ae = 68 marks 


Similarly, for student B 


= 
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For student B, 


Pea bin ; wets”. 
M,,(B) = value or (*#+) item = value of A) item 


= yalue of (5.5)"" item 


70+ 73 
MAB) = —q = 71.5 
Since, MAA) < MAB) 
Hence, B is more intelligent than A 
(c) For student 4 


un = V1: 


S.D, 0(A) = 


For student B 


i 796 (10 
4 S.D. (8) = \ [= Zot a Ga) = 8.79. 
\ - 


Oy 100% = 16.48% 


For student 4, C.V.(A) = “ x 100% = 2 


§ o(B 8.79 
| For student B, C.V. (B) = en x 100% = 715 * 100% = 12.29%. 
A 
Conclusion: Since C.V. (student B) < C.V. (student A), so performance of student B is more 
consistent than that of student A. Therefore, if the consistency of performance is the criteria for 


awarding a prize, student B should get the price. 


xample 2.76 | Students' age in the regular daytime BCA program and the morning program of Niharika 
campus are described by two samples. If the homogeneity in age of the class is a positive factor in 
learning make suggestion, with reason, which of the two groups will be easier to teach? 


Regular BCA Program Morning BCA Program | 


No. of Students Age No. of Students 

ge a 

2 31 
28 5 30 5 
22. 10 29 4 
30 ] 28 6 
21 4 33 5 
25 1] 34 5 
26 6 35 

3 36 

9 32 


Solution: Now, we have to find the coefficient of variation (C V.) for each program to compare the 
homogeneity. 


i. 
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For Regular B 


CA P 
No. of eee 
Students (f) d=X~25 


Lfd = 319 


s Xfd =33 
Mean (X) =4+514_ 95 +] = 25 — 0.55 = 24.45. 


60 — 
_~ [afd (2) - 319 GR) - 
S.D. (6) = R. U\ We PMs eo eof 
2.239 
C.V. (Regular BCA program) =< x 100 = 7445 x 100 = 9.16%. 


For Morning BCA Program 


Xx zd 42.41 2.02 31.2. 
Mean (X) =A += =31 +6573! 0. 


S.D.(6) = Lid (fd) = 50 (5) = 2,965. 


ae 
C.V. (Morning BCA program) =3 x 100% = 37.2 * 100 = 9.50% ae, 
Since, C.V. (Regular BCA program) < C.V. (Morning eae tae : ab 

of regular BCA program is more homogeneous than oak ‘ eeeay eee 
Hence, we conclude that it is easier to teach in Regular | 


Program. 
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employees in a firm: 


The following are the monthly salaries in rupees OF 30 
139 140 126 114 88 100 62 77 103 99 
129 108 144 148 134 69 63 132 148 118 


6 133 123 ; 
ee ee ) and 35, for individuals in the respective salary 


but not exceeding 90 and so on. Fing 


The firm gave bonus of Rs. 10, 15, 20, 25, 3¢ ens? 
groups, exceeding 60 but not exceeding 75, exceeding 
the average bonus paid and the standard deviation. 


Solution: 


B 


61-75 
76-90 c 
91-105 
0 


105-120 
121-135 
136-150 


=79 


- Zfd' z, 
*. Average bonus (X) =A roe xh =25 + x 5=25-0.5=Rs, 24.5. 


and S.D. (6) =\ Pe _ (ie) xh=~ -(=2) x 5=Rs. 8.09. 


Example 2.78 | The arithmetic mean and the standard deviation of 9 items are 43 and 5 respectively. If 
an item 63 is added, find the mean and standard deviation of 10 items. 


Solution: Here, n=9, xX = 43 and S.D. (0) =5 


_ x 2 
We have, pee => UWeH=n. x =9 x 43 = 387 


n 
ax? Xx? 2 
And a aa (ay = fan [ae = 5 =-\ |= -1849 


Squaring on both sides, we get 


x2 Ex? 
25 sai tie 1849 = 25+1849 ="9 => X= 16866 


According to question, a new item 63 is added, then 
n= 10, Xx = 387 + 63 = 450 and Dx? = 16866 + 63? = 20835 


Now, new Mean (x) = New Be 2. 45 


New 

d ; = 20835 
an new S.D (0) ; ~(xf =4 [== 
Example 2.79 | The mean and standard deviation of 100 items are found to be 40 and 10 respectively. On 


consequent investigation, two items were w i 
: rongl 
correct mean and correct standard deviation. Perea ee eee ee eee 


Solution: Here, n = 100, X = 40, S.D. (a) = 10 


Baa edie 
We have LX =n.¥ = 100 x 49 = 4000 escriptive Statistics _81 
And S.D. (0) = ut S = wen 
= "\y 1007-48 
or. 100 = 2x 1600 
100~ 1° => Y= 100 « 1700 = 170000 


According to question, two items 30 and 7 
Corrected 2x = 4000 — 30-70 + 3 + 27 = 2939 
and corrected 2x? = 170000 — 302 — 792 + + 374272 = 164938 


Corrected Mean _, Someaied Bx 2930 
n “100 = 29: 


Cc 
and Corrected S.D. (0) = sorrected aK Uy =~“ Vrs (39.3) = 10.24 


Example 2.80 | The following two samples describes the age of the students in Patan Campus and Mechi 
Campus BCA programme. 


[PatanCampus|_-25 | 31 | 29 a6 1 33. | 87 | eB 
Mechi Campus 


i. Calculate the mean and standard deviation of age of students of each campus. 


ii. Ifthe homogeneity i in age of the students is a positive factor for learning, which of the 
two campus will be easier to teach? 


iii. Calculate the mean, variance and coefficient of variation of age of students of both 
campus taken together. 
Solution: Let, _X, be the age of students in Patan Campus. 
X) be the age of students in Mechi Campus. 


a ee ee ee 


0 are wrongly taken instead of 3 and 27. So, 


Here, N, = Nz = 10 


i. Calculation of mean and standard deviation of the age of the students 


= ae 2 26 7 
For Patan Campus: Mean (X1) = = 26.8 
7538 (6.9)! =236 
Standard deviation (01) = 
Ba 304 354 
For Mechi Campus: Mean (X2)=W, = 10 


a e EX) S = O16 | (30.47 = 2.73 
Standard deviation (62) = Ny Ny \ 10 


st 
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et f the students 
ii. Calculation of coefficient of variation of the age 0 


2.36 of = 8.81% 
OL === x 100% = 8. oO. 
For Patan Campus: C.V. (x1) = Y, x 100% = 76 8 


x 100% =214 x 100% = 8.89%. 


in Patan campus is more homogeneous than 
e of the students is a positive factor for 


& 
For Mechi Campus: C.V.(X2) = %, 


Since C.V.(X;) < C.V.(X2), the age of students 
that of Mechi campus. As homogeneity in ag 
learning, Model campus will be easier to teach. 
iii. Calculation of mean, variance and coefficient 0 
campus taken together 


f variation of age of students of both 


10 x 26.8 + 10 x 30.4 268 +304 57.2 _ 
—————_——eanaenernaneneeenn = = 


NX + NoX2 50 70 = 28.6 


Combined Mean (Xi2)=""y yy, = 10+ 10 


and their combined standard deviation is given by 


N, (2 + &) +Ny(03 + d) 
p> Ni, +N 


Where, d, =X; — X= 26.8 —28.6 =— 1.8 and dy = X7— Xi2= 30.4 — 28.6 = 1.8. 


2 2 2 2 
_ 10 (2.36 + (-1.8) ) + 10(2.73 + 1.8°) =3.12 


01, = 10+10 


Combined Variation (0%) =9.75." 


_ Sn _3.12 = 
CV. = ie 100% = 58 x 100 = 10.90% 


Exercise 2.1 


va wy p> 


. Theoretical questions: 
. What is average? What are the desirable properties for an average to posses? 
. Discuss the criteria of making a choice of an appropriate average. 


List the important properties of the averages. 
"Arithmetic mean is regarded as the best of all averages". Explain it with reason. 


What is the measure of central tendency called? Describe briefly the various methods of measuring 
central tendency. 


Explain the properties of good measures of central tendency. 


- What is arithmetic mean? Write its merits and demerits. 


- What do you mean by median? Under what condition is the median more suitable than other 


measures of central tendency? 


What is mode? In how many types the data is categorized based on mode? 
When is mode said to be ill-defined? Also 


ive two practi ituati ; 
duéwiceutinnas. 8 Practical situations where you will recommend 


: the relationship between mean, median & mode. Also point out their merits and demerits. 
: y 1s an average called a measure of central tendency? Give reason 


"It is said that the choice of an avera 


statement. ge depends upon the particular problem in hand". Explain the 


an 
for 


d 


1 


15. 


16, What do you mean by dispersion? What are differe 
What are the requisites of an ideal meas 


What do you understand by absolute and relative 
relative measures over the absolute measures of di 


17. 
18. 


19. 
20. 


21. 
22. 
23. 
24. 


statement with illustration. 
Define the following: 

; Partition values 

iii. Mode 


ll. Weighted arithmetic mean 


ure of dispersion? 


measure of disper 
spersion. 


nt methods of measuring dispersion? 


sion? Explain advantages of the 


What are the roles of measures of dispersion in descriptive statistics? 


Distinguish between absolute and relative me 


preferable for the study of comparison of variability? 

Define standard deviation and discuss its mathematical properties. 
Define standard deviation. Why is it called an ideal measure of dispersion? 

What do you understand by coefficient of variation? What purpose does it serve? 


Write short notes on 
a. Range 


b. Quartile deviation 


Exercise 2.2 


Measurement of Central Tendency 


. The production of paddy in five places is as follows: 


asure of dispersion. Which measure will be more 


Find the arithmetic mean. 
The following table gives the basic salaries of the persons employed in a factory. Calculate the 


average basic salary by using 
a. Direct method 


b. Short-cut method c. 


Step deviation method 


Salary (Rs.) 


1700 | 1900 2100 | 2300 


No. of students 


. Calculate the mean from the following da 


Marks 


No. of workers 


Mas Growp | 0-10 [10-20 | 
es 


e students from the data given below: 


20-30 | 30-40 


[ 40-50 | 50-60 


15 


ta using direct method. 


+ 


10-50 | 10-60 


orkers. Assume that none are earning 


more than Rs.200. 


80 | 100 


120 | 140 | 160 | 180 


ope oye BCA 
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jven in the following tale. The total income of the rm 


6. a) Income of employees of a factory is 8 Compute the arithmetic mean of the income, 
: ‘ oup is Rs.3,000. Comp 
employees in the highest income group IS \°:-"» 100-150 | 150-200 | 200-250 | 250 & ove, 
0-50 | 50-100 30 70 10 
100 
Frequenc 90 150 i i is gi 
b) = the Sina data of income distribution, calculate the Se ae rire i wa mt 
income of the person in the highest oup Is Rs.435 and none is earning 


70 80 
Income (Rs.) below 40 | 50 60 | i 


[Income (Rs.) 


80 & above 
5. 


o a bonus to each worker on the basis of 


No. of persons 
7. A factory pays its workers on a piece rate basis and als 


individual output in each month. The rate of bonus payable is as follows: s 
Output (unit) Below 75 | 75-79 | 80-84 | 85-89 | 90-94 | 95-99 over” 


| Bonus (Rs.) 35 45 50 60 70 80 
The individual output of 50 workers is given below: 
| 94 83 78 16 88 86 93 80 91 82 
| 89 97 92 84 82 80 85 83 98 103 
| 90 87 8] 99 86 95 81 88 88 87 
| | 84 97 80 75 93 101 82 82 89 1 
| 80 71 87 77 98 83 72 75 83 85 


Calculate the average bonus per worker for the month. 
8. The following are the weekly production of the product X in units of 60 workers in a manufacturing company: 
23 48 Sl 64 «82 19 33 50 39 72 35 88 
| 77 25 39 52 48 64 49 57 41 a 62 49 
| 32 54 67 46 55 52 82 44 75 56 51 63 
59 69 53 57 75 85 68 55 52 45 40 ST 


20 #42 #446 «251 4 650)0616—t—(‘<C i (tiSHC‘Qz‘KNNSC*‘«wKSOV 5575 
The management has decided to give bonus of Rs.5, 10, 15, 20 and 25 to each worker in the respective 
output group of 40 or over weekly output. Find the average bonus received by the workers. 


9. The mean of the following distribution is 1.46. Find the missing frequencies. 


a 


10. 100 Salesman were appointed in various places of Kathmandu valley and the following data wer 
compiled from their sales reports. 

Sales (in Rs.'000') 8 - 12 | 12- 16| 16-20] 20-24] 24-28 [28 - 32] 32 - 36 |36- 4 

No. of salesman 11 13 16 14 = 9 4 

If the average sale is believed to be Rs.19.92, find the missing frequencies 


11. a) The following table represents the week] 
average weekly wage per worker. 


Wages (Rs. 
Average no. of hrs worked per week 


y wages of the workers in a firm. Calculate thé 


160-180]180-200]200-220 
a oe ee ae 


12. 


13. 


14, 


15, 


16 


A cold store sells five different products. Find the av 


the following information. erage profit per unit of the quantity sold from 
Product ; 
[~ A Profit per units (Rs.) Quantity sold (unit) 
B 4 150 
9 50 
C 
6 250 
D 
2 450 
100 


A professor has decided to use different weights in different evaluations of a group of student s in a 
year. The weights assigned are as follows: 


Homework: 20%, Mid-term: 25%, Final paper: 35%, Term paper: 10% and Presentation: 10% 


From the data on five students, compute the average marks of the students in the final examination. 
All marks of the students are given out of 100 full marks. Who should get the scholarship, if best 


erformance is the criteria to award the senolaune 
F Students Home work Mid term Final paper | Term paper | Presentation 
A 85 87 o | 4 | 9» 
B 78 91 92 88 84 
Cc 94 86 89 93 88 
D 82 84 93 88 79 
E 95 82 88 92 90 


From the information given below, find 
a. Which factory pays larger amount as daily wage? 
b. What is the average daily wage for the workers of th 


e two factory? 
Factory A 
250 
20 


Factory B 
200 
25 


ctory, running two shifts of 60 and 40 workers is Rs.80. The 
the morning shift is Rs.40. Find the mean wage of the 40 


| 


INo. of workers 
Average daily wage (in Rs.) 
The mean wage of 100 workers in a fa 


mean wage of 60 workers working in 
workers working in the afternoon shift. 


The number of workers I each section © 
Find the average of the daily wages ofa 


Section 


fa factory and their average of daily wages are given below. 
ll the workers in the factory. 


Number of workers 


Average of daily wages 


19. 


20. 


21. 


22. 


23. 


24. 


26. 


27. 


28. 


29. 
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a test was 52. The top 20% of then secured a mean 


5 ‘nts in 
erage scores of a group of students : i tn 
pein: sai f 31. Find the mean score ol the remaining students. 


score of 80 and the lowest 25% a mean score 0 
The pass result of 40 students who took up a class te 
Marks | 40 | SO 60 


st is given below: 


No. of students 
If the average mark of all the 50 workers was 51.6, find out the 


The mean mark obtained by 150 students in a class is 60. The mean mark of boys is 70 and that of 


girls is 55. Find the number of boys and girls in the class. 

In a class of 50 students, 10 have failed and their average marks is 2.5. The total marks secured by 
entire class were 281. Find the average marks of those students who have passed. 

The arithmetic mean of 100 items was 40. Later on, it was found that an item 53 was misread as 83, 
Find the correct mean when the wrong item is omitted. 

The mean salary paid to 1000 employees of an establishment was found to be Rs. 180.40. Later on, it 
was discovered that the salary of two employees was wrongly entered as Rs.297 and Rs.165. Their 
correct salaries were Rs.197 and Rs.185 respectively. Find the correct mean salary. 

Find the geometric mean of the following statistical data: 

a. 10,110, 120, 50, 52, 80, 37, 60 b. 125, 130, 75, 10, 45, 0.5, 0.4, 500, 1505 


Calculate the geometric mean for the following data. 


average marks of the students who failed? 


Marks obtained | _0-10 | 30-40 
No. of students | 5 25 


Find the average growth rate of population, which is increased by 20% in the first decade, by 25% in 
the second decade, and by the 44% in the decade. 

A machine was purchased for Rs.50,000 in 2015 A.D. Depreciation on the diminishing balance was 
charged by the 30% in the first year, 25% in the second year and 15% per annum during the next 
three years. Find the average rate of depreciation. 

a) Find the harmonic mean of 4, 6 and 10. 


b) Compute the harmonic mean for the following data. 


a a a 6 [os 
f | 2 3 3 2 


a) Acar driver covers a distance of 200 kilometers from Kathmandu to Pokhara at the rate of 50 
km/hr. In return journey, he covers the distance at the rate of 100 km/hr. Find the average 
speed of the journey to and info. 


Cities A, B and C are equidistant from each other. A motorist travels from A to B at 30km/hr, 


from B to C at 40/km/hr and from C to A at 5O0km/hr. Determine his average speed for the 
entire trip. 


¢) Aman travelled by car for 3 days. He covered 480 kms each day. On the first day, he drove for 


10 hours at 48 km/hr. On the second day he drove for 12 hours at 40 km/hr and on last day he 
drove for 15 hours at 32 km/hr. What was his average speed? 


b) 


: D . en le Per 
30. Find the geometric mean and harmonic mean from lea escriptive Statistics 87 
: a: 

Class interval | 10-20 | 20-30 ar - 

Frequency 30 75 on — 50 50 - 60 
60 15 
31. a) Compute AM, GM and HM of the followi ‘ ; 

10, 12, 14, 16, 18, 20 ing observations and verify that AM > GM > HM 


b) If for two observations, the arithmetic : 
i mean is 
mean of the observations? 25 & harmon 


ic mean is 9, what is the geometric 


[Hint: Use the relationship between Am, GM and HM. i.e. GM?= A 
32. Find the median from the following statistical data. 18, = AM x HM] 
a) 15, 10,5, 13, 12, 1, 15, 9, 8, 18 
b) 40, 50, 30, 20, 25, 35, 30, 30, 20, 30 


c) 
= : 10 11 12 13 14 5 rv 
u 9 1 
7. 5 
< ae ae ee ae 
\Profit (Rs.'00") 
No. of shops 


75 —84 | 85-94 | 95 — 104 | 105 -114| 115 —124| 125— 134 | 135 -144 
9 22 


ea 125 133 


Prepare the frequency distribution and find the median. 
35. Calculate the median from the following data. 


Daily wages (in Rs.)| ,-°%,5| 100-200 | 200-300 | 300-400 | 400-500 


an mean for this data? 

of them earn less than Rs.500 per day, 
loyees earn between Rs.1,000 and 
d rest of them earn Rs.2,000 


Give reason that why median is more appropriate th 

36. a. A manufacturing company has 1000 employees. 10% 
200 earn between Rs.500 and Rs.999, 30% of the emp 
Rs.1499, 250 employees earn between Rs.1,500 and Rs.1,999 an 


and above. Calculate the suitable average wage. Give reason. 


b. Calculate the appropriate measures of central tendency from the following distribution and 


support your choice. 
Monthly Below 1000] 1000-1999 4000-2999 | 3000-3999 | 4000-4999 | 5000 and 
Income (Rs.) above 


= 


llowing marks distribution. Give reason for Choigg 


of Probability and Statistics for BCA 


e the appropriate average marks from the fo! 
Marks up to as | 38 | @ | 55 | 65 | 75 a 
No. of students | [28 | 57 | 92 | 101 | 104 
1 of 13 students who had appeared in an examination, 4 students were failed. The mar 
idents were 43, 57, 45, 61, 75, 64, 53, 50 and 40. Calculate the median marks of at 
44 


mnthly expenditure in rupees for a group of families is as follows: 
ixpenditure (in Rs.) | 100-200 | 200-300 [300-350 350-400 | 400-500 
| 


No. of families 20 
_ of the expenditure is known to be Rs.3 17.50. Determine the number of families having 
4 


| iture between Rs.200 — Rs.400. 
yenditure of 1000 families is given below: 
xpenditure (Rs.) | 40-59 | 60-79 | 80-99 |100~- 1191120 — 139 
No. of families | 50 500 S| 30 
iii dian expenditure is Rs.87 obtain the missing frequencies 
q) _. Je from the following. 
5, 3,5, 5,9 
14, 14, 15, 15, 15, 15, 16, 16, 16, 17, 17, 19 
es of 50 customers visiting a shop were as follows: 
Shoe size | «@ | 4 s [| 9 | 10 


No. of customers 4 17 22 5 2 


| 


i : modal shoe size. 
; ‘* + modal weight from the following data. 


No. of persons 37 27 eh 


om sample survey was conducted by ABC shoe company to determine the size of the 
should produce so that the size can be fit for majority of people in a population. The 


ng information was obtained from the survey. 
size of foot (inches) 


e of the shoe should the company produce to fulfill the objective? 


gets a pocket money allowance Rs. 120 per day. Thinking that this was rather less, she 
ier friends about their allowances and obtained the following data which includes het 
ice also. 


120 | 180 | 100 | so | 250 | 200 | 200 | 220 | 150 | 100 | 
100 | 150 180 | 100 | 150 | 100 | 180 | 150 | 
150 | 100 | 150 | 100 | 120 | 180 | 200 | so | 80 | 


*sented this data to her father and asked for an increase in her allowance as she W® 
less than average amount. Her father countered pointing out that her allowance W 
‘more than the average amount. Reconcile these statements with reason. 


43. The frequency distribution of the marks obtained by 60 students of a cl 


44. Fi 


46. 4) 


47, 


48, 
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ass in a college is given below: 


18 


Marks 


Find the value of mode. 
nd out the mean and mode from the following data, 


IMid value 


Mr. Shrestha is the director of . the student financial aid office at a campus. He has used 
available data on Seeney earings of all the students who have applied to his office for 
financial aid. The following frequency distribution is given below: 


1500-2000 | 2000-2500 | 2500-3000 | 3000 and 


i. Find the modal value for the Shrestha's data. 
If the student's aid is restricted to those summer earnings were at least modal summer 
earnings, how many of the applicants qualify? 

A cement company sells his production in different cities trough the appointed dealers. The 
sales of his production in the last year are given in the following table: 


il. 


b) 


pate 500 — 1000 |1000 — 1500] 1500 — 2000/2000 - 2500 
(in '00' bags) 
No. of dealers 


Find the value of most usual sales. 


Find the number of dealers selling more that the usual sales. 
f money to be awarded to the dealers at Rs. 5000 each dealer whose 


iii. Calculate the amount o 


annual sale is more than the most usual sale. 
w, frequencies corresponding to two groups are 


to be 24. Find the missing frequencies. 
30 — 40 40 — 50 


In the marks distribution of 100 students given belo 
r, the mode is known 


ies is given below: 


The weekly expenditure of 1000 famil 
[Expenditure (in Rs.'00") 40 - 59 | 60-79 
No. of families 50 
The median for the distribution is Rs. 8700. Calculate the m 
mode of the distribution. 


culate the 


issing frequencies. Also cal 
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49. From the following data, compute the mean, 


50. 


§1. 


52. 


53. 


54. 


55. 


56. 


s for BCA 


median, and modal output of all the workers of sp 


the value of median. 

b) In a moderately symmetrical distribution, 
respectively. Find the value of mean. 

c) Ifmean=40 and mode = 30 then find median. 


the value of mode and median are 20 and y 


Compute the first quartile, sixth decile and 82" percentile from the following data. 

82 56 90 50 120. 75 75 80 130 65 ‘ . 
Calculate the quartiles (upper quartile and lower quartile), ath decile and 60" percentile from th 
following information. 

[ cz [30-40] 40 - 50] 50-60 | 60-70 | 70-80 80 — 90| 90 — 100 
F 1 3 11 21 43 32 9 
From the following distribution of marks of 250 students of a campus, find the minimum pass mark 
if only 20% of the students has failed and also find the minimum marks obtained by the top 25% o 
the students. 
Marks | 020 [20 - 40] 40 — 50| 50 — 60] 60 - 80 80 ~ 100 
No. of students 25 50 75 45 30 25 
The marks distribution of 50 students in a subject is given below: 
[Marks more than | 0 10 | 20 | 30 | 40 | 50 
No. of students 50 46 40 20 10 3 
If 60% of the students passed this test, find the minimum marks obtained by the pass candidate. 
From the following distribution of income of 1500 persons, find 
a. Limits of central 60% of the persons. 
b. Lowest income of richest 60% of the persons 
c. Highest income of poorest 60% of the persons 


0 


The following table shows the wage distribution in a certain factory. 


Weekly wage No. of No. of 
employees 


employees 


Weekly wages 
(Rs.) 


120-140 35 
140-160 18 
160-180 7 


180-200 


Determine: 


a. The wage limits for he middle 50% of the wage earners 
b. The percentage of workers who earned between Rs. 75 


Income (in Rs.'000')] 0-5 
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ment and Dispersion: 


Measure : . 
57. (a) The following are the prices of shares of ABC company from Monday to Saturd 
ay: 
Dey Monday | Tuesda 
y | Wednesday | Thursd : 
Price of share(Rs.) 200 710 - y ms ay me Saute 
5 


58. 


59. 


60. 


61. 


62, 


Calculate range and its coefficient. 
Find the range temperature and its'coefficient from the data: 


nes Monday|Tuesday| Wednesday Thursday] Friday Saturday 


Find the range and the coefficient of range from the following data 


[income (Rs.‘000') | _ 50 | 60 | 70 | 80 | 90 | 100 
(a) Calculate range and its coefficient from the following data: 
No.ofStdents [8 [io fi [s ja | 


(b) The following table gives the age distribution of a group of 50 individuals. 
Age ( in years) 31-36 


Calculate range and the coefficient of range. 
The index numbers of price of cotton shares and coal shares in a given year are as follows 


sfonth _______[lan|Feb [Marc April May] unetuly| August Sept] Ost Nov Dee} 
178 


164 [172 [igs |isaliss [aii [217/232 [pao 
Price of coal shares (Rs.) _|131]130 |130 


Calculate range for each share. Which share do you consider more variable? 
(a) Compute quartile deviation and its coefficient from the follewing data. 


in ae ee 
vats | 20 [28 | 0 [2 


(b) The following data gives the height in cms of 8 persons: 
156 162 161 163 164 165 159 


(b) 


(c) 


(a) 


Compute 


[Frequency | 3 


(b) deviation an 


e wages per day in 


son 0 | 3-00 [0 
ee 


16 


(a) 
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Calculate the quartile deviation and its coefficient, 


reasure from the following data: 


64. (a) Age distribution of 200 employees of a firm is 


(b) Calculate the Semi-inter-quartile range and its relative 1 on 
2 C : 
[Variable 30-29 | 30-39 | 40-49 = 60-69 | 7 
4? 3 
ae 306 182 144 96 42 34 
(c) Calculate quartile deviation and its relative coefficient from the following data: 
2 5 18 
Central Value 6 9 12 l é 
14 24 38 20 4 
the production of new style of collar to attract young men 
are available based upon the measurement of, 


Frequency 


(d) A collar manufacturer is considering 
The following statistics of neck circumferences 


typical group of college students: 
Mid-value(inches) 


No. of students 
Calculate dispersion using quartile values and its coefficient from the distribution of ‘collar 


14.0 | 14.5 | 15.0 | 15.5 | 160 | 165] 
a 1 ee | 2} wm) 1 ; 


urrent consumed. 


units 
Calculate the lower and upper quartiles and hence find the quartile deviation. 


measurements: ooh ; 
(a) A survey of domestic consumption of electricity gave the following distribution of c 
consumed 
No. of 
(b) .The distribution of fortnightly wages of 280 employees of an undertaking is as follows 
Fortnightly eee 


No. of 
600-800 |800- 
32 
consumers 
wages (in 


Calculate the appropriate measures of dispersion from the given distribution and support for 
your choice of measure. 
(c) The following table gives the monthly income of 1000 persons: 


Income 
(in’000°Rs,) Below SO | 50-70 | 70-90 | 90-110] 110-130 130-150 150 & above 


oo | 140 | 300 | 230 [as [si 


Calculate the most suitable measure of dispersion giving reasons for your choice. 


given inthe following table: 


130 


Calculate semi-inter-quartile range. 
(b) Calculate the coefficient of quartile deviation from the following data 


Marks [Below 20 [Bel 


iecesisreicsed nienteicnnninsil 


Sheteitansaiing Deseripti 


Calculate the coefficient of quartile 
(<) Se aeoecereoonsen Quartile deviation for the following data 


~y 


Monthly deposits ‘000 Rs.° | 

(More than) ha b2 is | us | 20 
| 
| 


|| 29| 


a i : | 


, 
a } 9 | 12 
No. of accounts mn “00° 11s i 49 


] 
| 
| 
10 | 
{ 


9 6 


| 
al | « . 
ae 


(d) Find the intet-quartile range and the coefficient of 
epunesinnsiiesachndgpantonttaa she ae dataltsrube silat eh 0 


7 quartile deviation from the + 
Marks in Above T : —~ é a 


| 


| Above | Above 


. | Above | / ve | ve \ 
Statistics 0 | 10 } 20 | 7 aor nae | Rae 
as a | eR, ; 40 | 50 60 | 

No. of iso | ya. | | of ame Nocmean waar | 
Students 50; 140 | 100 | so | 80 | 70 | 30 | 
Liat L | | | 

arene om on men 


6% The following are the scores of two batsmen A and B in a series of innings 
a-12 115 6 73 7 19 119 30 84 29 40 a 
B:47 112 76 42 4 51 37 48 13 0 50 
State who is more variable by using quartile deviation. 
66. (a) Find the value of third quartile if the value of first quartile and quartile devi 
18 respectively. 
(b) Calculate the lower and upper quartile of a distribution having quartile | 
coefficient of quartile deviation 0.2 


(c) The difference between the upper quartile and lower quartile of a certain freq\ 
is 4 and their sum is 16. Calculate the quartile deviation and its coefficient. 


67. (a) From the following income distribution, calculate mean deviation from 
deviation from median. 
Income (Rs.): 1250 1285 1340 1350 1380 


(b) The following are jeu ee 
[S.No | 2 | 3 
Pv 68 [os | 


from the foll« 


m 
m mean and its coefficient 


(b) Find out mean deviation an 
Frequency [2 | 


69. Calculate the mean deviation and its coefficien 


70. If the mode for the following distri 


Expenditure in Rs. 


No. of Families 
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94 7 3 
om mean for the following data: 


71. (a) cee the mean deviation en a aa 


(b) Age distribution of hundred life insurance 


Age as On! 17-19.5 | 20-25.5 | 26-35.5 36405 | 41-05 | 51-555 56- 


Calculate mean deviation from median age and its coefficient. 
(c) Calculate mean deviation from median from the following data: 


policy holders is as follows: 


Calculate mean deviation from median and compare the variability of the two series A and B. 
Series A: 3484 4572 4124 3682 5624 4388 3680 4308 

Series B: 487 508 620 382 408 266 186 218 

73. (a) Blood serum cholesterol levels of 10 persons are as under: 

240, 260, 290, 245, 255, 288, 272, 263, 277, 251. 


Calculate standard deviation and variance. 
(b) The monthly salaries of a group of employees are given in the following table: 


salaries ins. 000) | 45] 50 | 55 | 60 | 65 | 70 | 75 | 
Nunberofenpioyees | 31s |*@]7]9 ]7[4]71 
Calculate the standard deviation of salaries. 


74. (a) The following data relate to the profit/loss made by various compani 
fais [me] om | | 
(‘0000’Rs.) 


Calculate the mean and standard deviation of profits. 
(b) Calculate standard deviation, coefficient of standard deviation and coefficient of variation from 


the following data: 
Age in years (Under) | 50 | 60 | 
. 100 | tuo | as [2s 


No. of persons dying 
No. of Students 


157 
202 
222 
230 


72. 


~I 
MN 


Up to 80 


75. 


76. 


77. 


78, 
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( 
dard deviati : led accordi :; 
the stan eviation and its coefficient: cording to the size of the articles as under. Find 


Measurement 
More than 80 


More than 30 


More than 70 Mo 
r 
More than 60 € than 20 
More than 10 
More than 50 ae 
€ 
More than 40 an 0 
ore than 90 


The number of check cashed each day at the fi 
following frequency distribution: Ive branches of a bank during the past month had the 


cass 200-399 | 400-599 
600-799 | 800-99 
ert Z 


Sh sharma, director of operations for t 

of more than 200 checks per day ae: ao that a standard deviation in check cashing 

because of the uneven workload. Should Shyam ae e eras athe la aati 

(a) The coefficient of variation of a distribution i ge Aaccioaie gee Sebati 

its mean. istribution is 60% and its standard deviation is 12. Find out 

(b) _ the coefficient of variation if variance is 16, number of items is 20 and sum of the items is 

(c) The mean and coefficient of variation of a certain data set are 12 and 25% respectively. 
Calculate the value of standard deviation and variance of the data. 

(a) If=f=N = 25, Zfx = 230 and Xfx? = 2660, find the coefficient of variation. 

(b) Coefficients of variation of two series are 60% and 80%. Their standard deviations are 24 and 

20 respectively. What are their arithmetic means? 

(c) Weekly average wages of workers in a factory increase from Rs 800 to Rs.1200 and standard 
deviation increases from Rs 100 to Rs.500. Have the wages become less uniform now? 


The means and standard deviations of two brands of light bulbs are given below: 


Pe Brad 
Standard deviation 100 hours 60 hours 
rands and which brand is more uniform? 


Calculate a measure of relative dispersions for the two b 
(a) For a group of 200 candidates, the mean and standard deviation were found to be 40 and 15. 


Later on it was discovered that the score 53 was misread as 35. Find the correct mean and 


standard deviation corresponding to the corrected figures. 


(b) A student obtained the mean and standard deviation of 100 obs 
respectively. It was later found that one observation was wrongly 
figure being 50. Find the correct mean and standard deviation. 


(c) The mean and standard deviation of a set of 100 observation were found to be pee i. 
respectively. On checking, it was tions were wrongly taken as a 


found that two observa sly 
instead of 43 and 18. Calculate the correct mean and correct standard deviation. 2s 
(d) The mean and standard deviation of a set of 200 items were found to be 60 and be ae _ 
At the time of checking, it was found that two items were incorrectly taken as 3a 
of 13 and 17. Calculate correct mean and correct standard deviation. 


What is the correct 
coefficient of variation? 


(d) 


ervations as 40 and 5.1 
copied as 40, the correct 


96 
79. 


80. 


81. 


ope ue ‘A 
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ithmetic mean and standard deviation are 58.5 ang r 
ns, the arithm = made that one of the observations recordeg 
s wer f the 50 observations if the 


(a) For a number of 51 observatio 
d standard deviation o 


respectively. It was found after calculation 
15 was incorrect. Find the correct mean an 
incorrect observation is omitted. 

(b) The mean and standard deviation of a gro 
respectively. After the calculations were ma 
incorrect , which were recorded as 21, 21 and | 
incorrect observations are omitted. 

(c) The mean and standard deviation of 20 items were found Lhasa cine eit At the 
time of checking it was found that one item 8 was incorrect, a os andarg 
deviation if (i) the wrong item is omitted, and (ii) it 1s replaced by 12. ; 

(d) The mean and variance of the marks in Statistics obtained by all ea 50 ee ee Me a Certain 
college was computed as 60 and 100 respectively. Later on it was peakiete . € score 16 
was wrongly taken as 67. Find the mean and standard deviation of scores W rong value jg 
omitted. Also calculate the coefficient of variation of marks after ignoring wrong value. 

(a) The mean and the standard deviation of a sample of size 10 were found to be 9.5 anq 
2.5respectively. Later on, an additional observation became available. This was 15 and wag 
included in the original sample. Find the mean and the standard deviation of the || 


up of 100 observation were found to be 20 and 3 
de it was found that three of the observations Were 
8. Find the mean and standard deviation jf the 


observations. 
(b) The mean and standard deviation of 100 items were found to be 80 and 12 respectively. Later 


on an additional item 70 is available and it is included in the same set of data. Find the mean 


and S.D. of all items including an additional item 70. 
(a) The average BCA entrance of 200 students of private campuses within valley is observed to be 7) 
with S.D. 30 and the average entrance score of 300 students of constituent campus is observed to 


be 60 with S.D. 30, what is the combined average and combined standard deviation? 
(b) An analysis of monthly income of workers of industry A and B are as follows: 


Po tncustry A] Industry B | 
No. of workers 500 600 
Rs.4200 Rs.4000 
Rs.9 Rs.8 


Average monthly income 


Standard deviation 
Find the coefficient of variation and variance of all 1100 workers of 


industry A and B taken 


together. 
(c) The first the two sub-groups has 100 items with mean 15 and standard deviation 3. If the whole 


group has 250 items with mean 15.6 and standard deviation /13.44 , find mean and the 


standard deviation of the second sub-group. 

(d) The combined mean and variance of the salary of 250 workers of city A and city B are 560 and 
5497 respectively. The mean and the variance of the salary of 100 workers of city A are 650 

and 121 respectively. Find the variance of the salary in city B. 

(a) aie aa 2 two types of electric bulbs. Type I electric bulbs have a mean life of 500 hour 
¢ deviation 20 hours. Type II electric bulbs have a mean life of 600 hours with 


Phavegiehi . o. task. Group 1 was trained by program A: group 2, by program B. For the 
2 pride ese 2 elon of 32.11 hours to train each employee, with a variance of 68.0" 
hy teen: ir OOK an average of 19.75 hours to train each employee, with a variane 

14, ining program has less relative variability in its performance? 


83. 


85. 
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c) Business risk is sometimes defin 
(c) OE rere Seba as the relative dispersion of the net ing i 
ge net Operating income Rs.5 00 000 Wik soled Gc ies 
9300, with standard deviation 


.2,00,000. Th i 
Rs 0 e ioe figures for firm B g 
firm has a greater degree of business risk? ee age 


alysis of monthly wage ‘ 
An an = se fell ¥ wages paid to the workers in : 
industry gives the following information. wo firms A and B, belonging to the same 


500 
586 
81 


Number of workers 
Average monthly wage 
Variance of wages 

j. | Which firm A or B has a larger wage bill? 
ii, In which firm A or B has greater variability in wages? 

iil Calcul the mean and standard deviation of wages a all workers of the fi 

An analysis of monthly wages paid to workers in two firms A and B b : ‘ keds - 

gives the following results: elonging to the same industry 


No. of workers 


Average wage 
Variance of distribution of wage 


Find 
(i) In which firm is there greater variability in individual wages? 
(ii) The average wage and variance of wage of all workers of firm A and B taken together. 


(a) An analysis of weekly wages paid to workers in two firms A and B, belonging to the same 
industry gives the following results: : 


No. of workers 
Average wage(Rs.) 
Standard deviation of wage (Rs.) 


(i) Which firm pays larger amount as weekly wages? 


(ii) Which firm shows greater variability in the distribution of weekly wages? 
of all workers in two firms taken together? 


(iii) What is the mean and variance | 
y wages paid to workers in two firms A and B, belonging to the same 


(b) An analysis of month] 
550 


industry gives the following results: 
50 45 
90 120 


nthly wages? 
y in individual wages? . 
d standard deviation in the distribution of 


No. of wage earners 
Average monthly wages (in ‘00’ Rs.) 
Standard deviation of the distribution of 
wages (in ‘00’Rs.) 

(i) Which firm A or B pays larger amount as mo 


(ii) In which firm A or B, is there greater partialit 
monthly wages an 


(iii) What are the measures of average 
individual wages of all workers in the two firms taken together? 


———— | | 
| oe tistics for BCA ' ' 
A Textbook of Probability and SW20 ducted a research-study on price behaviou; a 


on 
of a country has & ich are as follows: 


86. The shareholders Research Centre arene? 
three leading industrial shares, A, B and C,, the res Fsesllie 
price(Rs.) deviation) a 
4.5 


18.0 34.75 
22.5 39.00 


Cc 24.0 


: ble in value? 
(a) Which share, appear to be more sta ispose of at present and 
| (6) If you are the holder of all the three shares, which one would you like fo . mens me 


ing BCA pr 
87. (a) Following two samples describes the age of ee ae i aor ihe ene einen 
: . : : (e) eC 
evening BCA program of Niharika Campus be easier to teach? 


factor for learning which of the two programs will cal 
[Evening BCA [24 ] 30 | 28 | 23 | 25 | 22 | 26 [27 | 28 | 28 
evaluating three freshmen for the final spot on th 


The coach of University swimming team is : 
team. He has the two swimmers compete in five 100-meter freestyle races with these results: 


Swimmer X (in seconds) fo21 [ors [o32 [629 [61.7 
Swimmer Y(in seconds) [62.5 [61.9 [62.8 [63.0 |60.7_| 
The coach feels that consistency, as well as the best average, is important. Which swimmer 


should he choose? 


| | (c) XYZ Ltd. is actively considering the following two mutually exclusive projects for adoption, 
| Year Project X Project Y 
Profit (Rs. in Lakhs) Profit (Rs. in Lakhs) 
10 


] 5 

2 5 Pk: 
3 20 45 
4 40 30 
5 60 30 


Which is the risky project? 


6.0 


(b) 


(d) The running capacity of two horses is given below, state which is more consistent and why? 


[Hoek [250 - 
tose 8 aan [a [aso aos] pe 


(e) From the price of shares X and Y 9} i 
given below, state which share is more : 
: stable in values: 


88. The expenditures i i 
pe involved in repairing of two truck models and the corresponding life is given as below: 


g9. (9) 


(b) 


90. (a) 


(c) 


Which set of bags has uniform pressur® 
if the prices are the same, which m 


s scored b Descriptive Statistics 99 
Goal y two teams A and B ina football season w 
No. of goals in match 0 s ere as follows: 


No. of matches 


By calculating the coefficient of variation j 
; ariatio 
more consistent? nm in each case 
Students’ age in the regular daytim 
i ¢ BCA program P 
described by two samples. If the ho ‘th and the morning program of Universit 
; : : mogeneity in age of the i ve . Sri 
make suggestion, with reason, which of the two coe will rs ie : a aaa in learning 
acn/ 


Regular BCA Program Morning BCA Program 


find which team may be considered 


A sample of 60 cars of 
recorded as follows: 


No. of bikes ese 

Which model of bike has greater uniformity? ; = 
The polythene bags are taken randomly from two manufacturing companies A and B and are 
tested by a prospective buyer for bursting pressure. The results are as follows: 


10-15 | 15-20 | 20-25 | 25-30 30-35 
(in Ibs.) 


No. of bags 
BL 


e will have long life time, 
anufa the buyer and why? 


13. 
14. 


16. 
20. 


27. 


30. 


36. 


38. 
40. 


(d) Two brands of tyres are 

Fite (000"km) | 20-24 [ 24-28 | 
Fr 
Brand Y Poot. We ess 
Both the brands are offering same price and 
brand has consistent life. If bana ie 
brands, which one do you prefer ! 
(e) A buyer obtained samples of electric fan from two companies : sa ihe fa samples 

tested samples tested in his laboratory for length of life in number 0 Wing are 
the results of these tests. 


Number of electric fans 


a eee 2 


600-800 | 
[00-100 iP | 8 
[to00-1200 | S| 
1200-1400 «| of 
[taoo-isoo SO] S| 


What would you conclude as to which company's fan is more consistent? 


Length of the life (hours ) 


Answers 
A: Measurement of Central Tendency. 
1. 536000 tones 2. Rs.1,820 3. 34.333 4. 46.333 marks 
5. Rs.110 6. a) Rs.117.50 b) Rs. 48 7. Rs.59 
8. Rs 12 9. 76,38 10. 10,17 11. a) Rs.159.50 


b) Rs. 1576.62 12. X,=Rs. 4.65 
X 4 = 88.55, Xp = 87.75, Xo = 89.55, Xyp = 86.65, Xye= 88.50. C should get the scholorship 


a) Equal b) = Rs. 22.22 15. Rs.140 

Rs. 129.72 17. 51.36 18. 21 19. 50 boys and 100 girls. 
6.4 marks 21. 39.7, 39.57 22. Rs.180.32 23. a) 52.84 

b) 35.17 24. 13.71 25. 25.64 marks 26. 28.02% 

19.08% 28. a) 5.77 b) 4.44 29. a) 66.67km/hr 
b) 38.3 km/hr c) H.M. = 37.8678 km/hr 

GM = 31.19, HM = 29.04 31. a) AM= 15.5, GM = 12, HM = 9.93 

b) 15 32. a) 12 b) 30 c) 13 

d) Rs.4,675 33. 99.34 34. Rs.13,375 35. Rs.241.22 


a) Rs.1,332.83 b) Rs.2,134.635 
c) 43.28,the frequency distribution contains open ended class interval. 37. 45 
(a) 54, (b) 263, 137 39. a)5 b) 15 
8 
aan | 41. Sa. Tkg 42. a) M,=6.5inch 

) Saraswati computed arithmetic mean and her father computed Mode 


7,5 marks 44. 30, 33.33 Descriptive Statistics 101 
45. 45.71 


a Rs.1,240 ii 
46. 2)3) Mo= BS." ii) 727 eh Bea 

jit) 6,65,000 47. 23,21 i = : 0,000 ii) 133 
49. Mean = 26.32, Median = 25.54 & Mode = 24.32 59 a) i cg 

c) 36.67 51. QO, = 62.75, Ds= 8] 20 P, 7 0 Hn 5.1 b) Mean = 26 

20 Pay = 120, 

5, 01 = 97-14, Os = 83.44, D7 = 81.56, Poo= 78.37 
53. 30, 58.33 54. 25 marks 5 

by Rs.14,375 3) Retro S. a) Rs.10,625 and Rs.20,000 
56. (a) the limits for the central 50% of the employees are 82.5 and 132.14 (b) 48% 

- . . 0 


B: Dispersion 
=Rs. ; 
(a) Range s.90, Coefficient of range = 0.22 (b) Range= 15°C, Cocfficientefiange = 0263 


57. 
(c) Range = Rs. 50 (in 000), Coeff. of range = 0,33 
58, (a) Range acne coeff. of range=0.714 _ (b) Range =20 years,, Coeff. of range = 0.39 
59, Range (Cotton) = 76, Coeff. Of range (Cotton) = 0.19, Range (Coal) =22, Coeff. Of = 
0.084, Cotton shares are more variable in prices. Se ee ee 
60. (a) Q.D.=12.5 marks, Coeff. of Q.D. = 0.455 (b) Q.D. = 2.75 cms., Coeff. Of Q.D. = 0.017 
61. (a) QD.=2, Coeff. of Q.D.= 0.25 (b) Q.D. = 10, Coeff. of Q.D. = 0.2 
62. (a) Q.D.=Rs.34.5 Coeff. of Q.D. = 0.101 (b) Q.D. =10.71, Coeff. of Q.D. = 0.29 
(c) QD.=2.27, Coeff. of Q.D. = 0.204 (d) Q.D.= 0.447 inches, Coeff. of Q.D.=0.031 
63. (a) Q)=570.37 units, Q3 =1250 units, Q.D. =340 units 
(b) QD. =Rs. 177.4 (c) Q.D. = Rs.19.925 (in 000), the frequency distribution contains open class 
intervals. 
64. (a) Q.D. =4.75 years (b) ceoff. Of Q.D. = 0.273 (c) Coeff. of Q.D.=0.492 
(d) Interquartile range, Q;-Q: = Rs. 45.623, Coeff. of Q.D.= 0.55 
65. Coeff. of Q.D.(A) =0.75, coeff. of Q.D.(B) =0.59, A is more variable. 


66. (a) Q3= 140 (b) Q: =40, Qs= 60 (c) 2 and 0.25 
67. (a) M.D. from mean = Rs. 42.8, M.D. from median = Rs. 39 (b) (i) M.D. from mean = 12.94 


marks (ii) M.D. from median = 12.778 marks. 
MD. from mean = 2.8, coeff. of M.D.from mean = 0.35 


68. (a) 
(b) M.D.from median =2.4, coeff. of M.D. from median =0.24 
69. (a) M.D. from median = Rs. 31.607, coefficient of M.D. = 0.143 
10.20 


70. (a) Missing frequency = 21, M.D. from mean ~ Rs. 

71. (a) M.D. from mean =13.184 (b) M.D. from median =10.605, coeff. of M.D. from median =0.277 
(c) M.D. from median = Rs.31.607 

72. M.D. (A) = 490.25, Coeff. of M.D. (A)=0.116, M.D.(B) = 121.38, Coeff. of M.D. (B) = 0.307, 
Series B is more variable. 

73. (a) s.d. = 16.398, variance 

74, (a) Rs.22.09, Rs 13.55 (b) . 8.d.=19.758 y 
(c) sd. =17.26 marks, C.V. =42.69% (d) 8.4. 


=768.894 (b) Rs 10.35(000) 
ears, coeff. of s.d. = 0.5619 , C.V. 56.19% 


=16.83, Coefficient of s.d. =0.372 


76. 
77. (a) C.V.=50.65% 


78. 


79, 


85. 


86. 
87. 


88. 
89. 
90. 


102. _A Textbook of Probability and Statisti 
next month. 
75. Yes. He should worry about staffing fa le die 


- (a) Typel 


ics for BCA 


(b) C.V.=50% 


(a) Mean =20 
(b) 40, 25 


Yes, the revised wages are mo, 
€ 


; ised wages) =41.67% . 
pad yet Brand II is more uniform 


a ~ 125% 
(c) C.V. (Initial wages) 12.5%, : 
= LV. (Il) =7.79%, 
variable. (d) C.V. (I) = 12.5%, Cc ener 
(a) Correct mean = 40.09, correct s.d. = 15.02 (b) Correct mean — naahieg ; 52 
(c) Correct mean = 40.23, correct s.d. =11.82 (d) Correct mean ~ >7.5, s.d. = 20.09, 
ct C.V. = 33.60% 
6 ia n =59.37, s f =9.21 (b) Correct mean = 20, correct s.d. = 3.9 
a ean =59.37, s.d. =9. 
ii = .d. =1.99 
(c) (i) Mean = 10.1053, s.d. =1.997 (ii) Mean = 10.2, s 
(d) Correct mean = 60.18, correct s.d. =10.20 , C.V. = 16.95% 


Mean =10, s.d. =2.86 (b) Mean =79.9, s.d. =11.99 


. (a) . | 
- (a) Combined average = 64, combined s.d. =30.40 (b) Combined variance = 9989.08, combined 


C.V. = 2.44% (c) X¥, =16,0,=4 (d) a,’ =81 

(b) Training program A (c) FirmA 

B i. B 

Combined mean = Rs. 580 and combined S.D. = 11.016 

(i) C.V. (A) =0.67%, C.V.(B) = 0.71%, Firm B 

Combined mean = 1527.27, Combined variance = 731.29 

(a) (i) FirmB (ii) C.V.(A) = 1%, C.V.(B) = 1% . Both firms show equal variability 

(iii) Xi. =Rs.1733.33, o”,, =Rs.9190.22 (b) (i) Total monthly wage (A) =Rs.27, 500 ( in 00), Total 

monthly wage (B) =Rs.29,250(in 00) Firm B (ii). C.V.(A) = 18.974%, C.V.(B) = 24.34 % , FirmB 

(iii) Xz=RS.47.29 (in 00) = Rs.4729, 72 = Rs 10.60 (in 00) = Rs. 1060 

(a) Share B is more stable (b) Since , C.V(A) > C.V(B), so Share A should be disposed of. 

(a) C.V. (Evening MBS) =9.15%, C.V. (Morning MBS) =1 1.41%; Evening MBS program 

(b) Swimmer X (c) which is the risky project? 

(d) C.V. (A) =6.93%, C.V.(B) =2.35%, Horse B (e) Share Y 

Model T, has greater uniformity. 

(a) TeamB (b) C.V. (Re. MBA)=9, 16%, C.V. (Mmg MBA) = 9.52%, Regular MBA Progra 

(a) (i) 4.8 years, 4.6 years (ii) C.V. (Make P) = 46.04%, C.V. (Make Q) = 51.30%, Make P 

(b) C.V. (Model T, ) = 38.06%, C.V. (Model Tx) = 44.71 %, Model T, has greater uniformity 

7 nian aad C.V. (B) = 33.71 %, Buyer should prefer bags manufactured by A | 
oie € tyre Y because the tyre of brand Y has more consistent life than that 0 

eo) CV; = 
i sari for leg ofits tan ca pany B) = 14.653%, Company B’s fan are mo" 
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a eae ees ee 
hoice Questions circle (O) the correct answ 
er. 


multiple C 
‘ch is more appropriate central tend 
4. Whic : endency to find 
te Arithmetic: mean i) Median n in ren of profit? 
3, Mean js a measure of ode (d) All 
; i ] value) 
(a) location (centra (bo) di 
lation ; aPeien 
a) ; ag er (d) none of the above 
3, Which of the following is a measure of central value? 
dian 
. ai saad (b) Standard deviation 
z (d) Quartil iati 
4, Which of the following represents median? : cee 
(a) First quartile (b) Fiftieth percentile (c) Sixth decile (d) None of the above 


Ifa constant value 50 is subtracted from each observation of a set, the mean of the set is: 
(a) increased by 50 (b) decreased by 50 (c) isnot affected (d) zero 


._ Ifa constant 5 is added to each observation of a set, the mean is: 


6 
(a) increased by 5 (b) decreased by 5 
(c) 5 times the original mean (d) not affected 
7. If each observation of a set in multiplied by 10, the mean of the new set of observations: 
(a) remains the same (b) is ten times the original mean 
(c) is one-tenth of the original mean (d) is increased by 10 
8. If each value of a series is multiplied by 10, the median of the coded values is: 
(a) not affected (b) 10 times the original median value 
(c) one-tenth of the original median value (d) increased by 10 


10. 


11. 


12. 


13. 


14, 


1S, 


If each value of a series is multiplied by 10, the mode of the coded values is: 
(b) one-tenth of the original modal value 


(a) not affected 
(d) 100-times of the original modal value 


(c) 10-ties of the original modal value 


If each observation of a set is divided by 2, then the mean of new values: 
(b) is decrease by 2 


(d) remains the same 


the location parameters does not hold? 
(c) Ds = Median (d) De = Median 


(a) is two times the original mean, 

(c) is half of the original mean 

Which of the following relations among 
(a) Q.= Median (b) Ps = Median 


Harmonic mean is better than other means if 
(b) heights or lengths 


(a) speed or rates . 
(c) binary values like 0 and 1 (d) ratios or proportions 

The correct relationship between A.M., G.M and HM. is: 

(a) AM.=G.M.=H.M. (b) GM.2AM.2HM. 

(c) H.M.>G.M.2A.M. (d) A.M.2 G.M. 2 H.M. 

epee eae are ash (c) geometric mean (d) harmonic mean 


1 A 3 
Geometric mean of two numbers 7¢ and 35 1s: 
l = c) 10 (d) 100 

@ + () {00 ©) 
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A Textbook of Probability and Sta 96 per month and during last seven Months : 


first five months of a year 1S S 
The average expenditure per mo 


(b) Rs. 110 per month (c 
rs = 11.0. Average 


th during whole year is: 
) Rs. 100 permonth (d) Rs. 216 per ron 


Expenditure during 
strength of the first six members = | 05 


Rs. 120 per month. 
(a) Rs. 108 per month 
Average strength of eleven membe 


is = 11.5 
Average strength of the last six members nai 
The average strength of the sixth member 1s: (4) 10.0 
(a) 10.5 (b) 115 eee se hanubetais: 
: i ssl ? 
18. The average of the 7 number 7, 9, 12, x, 5,4, 11 is 9. ey & (d) 8 
(a) 13 (b) 14 (c) 
i rtion of 0.16 and 0.01 is: 
19 a Hs proportion of oe (c) 0.085 GL 00) 
‘ ae t 15 atas 
20. A train covered the first 5 km of its journey at a speed of 30 km/h and nex peed of 4; 
km/h. the average speed of the train was: 
42 
(a) 30 km/h (b) 40 knv/h (c) 32 km/h (d) : a | 
21. The second of the two samples has 50 item with mean 15. If the whole group has items with 
mean 16, the mean of the first sample is: 
(a) 18.0 (b) 15.5 (c) 16.5 (a). none of the above 
22. For a group of 100 candidates, the mean was found to be 40. Later on It was discovered that a valye 
45 was misread as 54. The correct mean is: 
(a) 40.50 (b) 39.85 (c) 39.80 (d) 39.91 
23. A distribution consists of three groups having 40, 50 and 60 items with means 20, 26 and |5 
respectively. The mean of the distribution is: 
(a) 20 (b) 18 (c) 15 (d) 21 
24. A set of values is said to be relatively uniform if it has 
(a) high dispersion (b) zerodispersion  (c) little dispersion (d) negative dispersion 
25. The measure of dispersion which ignores signs of the deviations from a calculated values is 
(a) range (b) quartile deviation (c) standard deviation (d) mean deviation 
26. Which measure of dispersion can be calculated in case of o pen end intervals? 
(a) range (b) quartile deviation (c) standard deviation (d) mean deviation 
27. If each value of a series is divided by 5, its coefficient of variation is deduced by: 
(a) Oper cent ! (b) 5 per cent (c) 10 percent (d) 20 per cent 
28. If each value of a series is multiplied by 10, the coefficient of variation will be increased by: 
di rd 5 per as _— (b) 10 per cent (c) 15 per cent (d) Oper cent 
. - paces : ue 10 is subtracted from each value of a series, the coefficient of variation will be: 
ou Sew (b) increased in comparison to original value 
ee ae see value . (d) none of the above 
. ue 0 i rot : 
a series 1s multiplied by a constant 'c’, the coefficient of variation as compared 


original value is: 
(a) increased (b) decreased (c) unaltered (d) zero 


Correlation and Regression 
Analysis 


Introduction 


iptive statistics su 
sans oe coe as ae tendency. Dispersion, skewness and k i i 
conceme ns of data associated with a single variabl pean ais aya ae 
problem there are two or more than two variables found which see . however in some practical 
ractical life, we come, across certain conditions, where changes in ene ane? together. Even in our 
changes in ee neue For example, the expenditure ig ear ea are accompanied by 
concerned fami i n increase in income is expected to cause an increase in ee e on the income of the 
such association etween two variables, the statistical analysis is required iture. To ascertain the 
statistical tool used to measure the degree of association (relationship) <a ee ate pace is a 
ables. 


elation analysis 1 isti 
ue ale Sanaa ian alae arn ce which studies the association or relationship between two or 
: 0 have correlation, if the change (i i 

capone ; ge (increase or decrease) in one 
nate pe ae aa 2 Be change (increase or decrease/decrease or increase) in the ps For 
e : owen eas = os Sere ee and Weight of children, quantity of fertilizer used 

; sure of correlation is 'correlation co-efficient’ g 

a 3ne co generally denoted b 
r cae . the ee and direction of association but it does not give cause and effect of association - 
relationship an amount of changes. It only enables us to have an idea about the degree and direction of 
the relationship between the variables under the study. 


3.1 


Some important definitions are given below: 
“Tf two or more quantities vary so that movements in the one tend to be accompanied by corresponding 
movements in the other (s) then they are said to be correlated.” -L.R. Cannor 
“Correlation is an analysis of co-variation between two or more variables” - A.M. Tuttle 


“When the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring 
the relationship and expressing it in brief formula is known as correlation.” - Croxton and Cowden 


Ke 
The measure of relationship between two or more variables is termed as correlation’’-Sir Francis Galton. 


The importance of studying correlation are as follows: 
a. Related to quantity of fertilizer used, types of soil, quality of seeds, amount of rainfall and so 
on. Correlation helps in quantifying precisely the degree and direction of such relationships. 


b. Correlation analysis contributes to th 
critically important variables on w 
connections by which disturbances spread 
forces may become effective. (W.A. Neiswanger) 

c. In theory of economics and business studies, we come across several types of variables which 
show some kind of relationship. For example, there exists a relationship between price, supply 
and quantity demanded; advertising expenditure and sales promotion etc. 

and ratio of variation are based on measure of correlation. 

prediction of variables. According to Tippett, 

f our prediction”. 


e understanding of economic behaviour, aids in locating the 
hich others depend, may reveal to the economist the 
and suggest to him the paths through which stabilizing 


d. The concepts of regression 


€. Measures of correlation give us t 
"the effect of correlation is to reduce t 
1 sciences, correl 


he more reliable 

he range of uncertainty o 

f. In most researches in socia ation analysis helps in arriving at very important 
conclusions. 
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3.2 Types of Correlation 
Following are the types of correlation. 
, ten Sadie é f variables deviated in same dire, 
1. Positive Correlation: Positive Correlation indicates that value o et 


Two variab es said h j if increase (or decrease in the value OT one aria. 
i id to have positive correlation, if 1 ( ) Lee f V ) 

Z e : ; “s i r variable. For example, family income and XPendit , 

results increas (o decrease) in the value of other variable. y : 


in luxury items, Height and Weight etc. 


30 34 
i) IncreasingT X: 17 20 25 


22 

IncreasingT ¥: 8 12 15 . - 

ii) Decreasing | X: 60 51 40 3 ; 
Decreasing! Y: 18 17 10 7 


2. Negative Correlation: Negative Correlation indicates that value of igi leis ; PPOs}, 
direction. Two variables said to have negative correlation, if increase (or gr ne € value Of oy 
variable results decrease (or increase) in the value of other variable. it : - sO - cit as inven, 
correlation. For example, price and demand of commodity, temperature and sale of woole garments ety 


)_Inereasing 
: 


Increasing T 


3. Linear Correlation: Two variables said to have linear correlation if unit change in the value of oy 
variable results constant change in the value of other variable over the entire range of values. Fa 
example, the correlation between 'the number of students admitted’ and the 'monthly fee collected’ 
linear in nature. 


5. Simple, Partial and Multiple Correlations 


Simple correlation: The degree of relationship between only two variables is called simp 
correlation. e.g. (i) A study on the yield of crop with respect to only amount of fertilizer, (ii) s# 
revenue with respect to amount of money spent on advertisement. 

Partial Correlation: It is also called net corre 
more than two correlated variable. When we study 
the variables taking remaining constant, it is 
between deposit and income level keeping interest rate constant. 


Correlation and Regression Analysi: 

3,3 Methods of Studying Simple Correlation — 
The commonly used methods for studying linear correlati b 
, Dis ee crenhie eats on between two variables are: 
7, Karl Pearson's Coefficient of Correlation 
3. Bivariate Correlation Method (Two way 
4 Spearman's Rank Correlation Method 


(Covariance method) 
frequency table) 


3.3.1 Scatter Diagram (Graphic method) 


Scatter diagram 1s one of the simplest diagrammatic representations of bivariate distributi 

consisting of dots or points. It 1s a graphical method of studying the correlation b £9 oe 
variables. In this method, the points are represented by dots by keeping values of one variable % 
axis and the values of other variable on Y-axis in the XY-plane. Then the diagram of dete 
obtained is called scatter diagram. Scatter diagram is used to observe the existence of correlation 


between two variables. The following are the different i 
es of 
bivariate data. types of pattern in scatter diagram of a 


a. The pattern of points on a scatter diagram reveals on upward trend rising from bottom left to 
top right. The variables in this case shows positive correlation between them. 


b. The pattern of points on a scatter diagram reveals a downward trend falling from top left to 
bottom right. The variables in this case show the negative correlation between them. 


c. When we can not trace any trend, there is no correlation between them. 


The following are some scatter diagrams depicting different types of correlation between two 
variables. 


1 Low degree Low degree negative 
ene ei iti relation correlation 
Perfect Positive Perfect Positive Positive co 

Correlation Correlation 


No correlation 


: High degree 
High degree . i 
eae, corcioid niepetvs aaa 
‘ jagram ; ount of 
Observations on Scatter DiABrat how a very little spread, then the fair good amo 
tter diagram show er diagram 


a. If the points on the sca 
correlation can be expec 
show a widely spread, then 


ted between the two variables and if the points on scatt 
the poor correlation may be expected. 
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correlation only. It gives the direc 
gram provides a rough measure of y. tion a 


b. The scatter dia a 
degree of correlation between the two varia 


c. It enables us to locate the line of best fit. 


Merits: 
a. It is the simplest method of measuring correlation. 


b. It is least affected by the extreme observations. 
c. It can be easily understood by non —statistician. 
d. If helps us to detect abnormal variates in the data. 


Demerits: 

a. _It gives only rough idea. 

b. It can not be numerically expressed. 

c. Itis not amenable to algebraic treatment. 


3.3.2 Karl Pearson's Coefficient of Correlation (Covariance Method) 

Karl Pearson’s developed a widely used mathematical formula to measure the degree (intensity) ¢ 
linear relationship between two variables is called Karl Pearson’s correlation coefficient. It is also Calleg 
product moment correlation coefficient or simple correlation coefficient. This method is based on jh, 
linear relationship between two variables (Series). Karl Pearson’s correlation coefficient is especiall 


useful when data are quantitatively measured. 
According to Pearson, correlation coefficient between two variables X and Y is denoted by r(X,Y) 
or simply r and is defined as the ratio of the covariance between them to the product of thei 


corresponding standard deviations. 


Cov (X-Y) Cov(X,Y) ; 
*. Coefficient of correlation, r = N69 VW) aay te) 


where, . Cov (X, Y) = Covariance between two variables _Y¥ & Y 
1 = - 
=, UX X) (YY) 


Covariance measures the simultaneous changes between two variables 


V(X) = Variance of X= 0," = 1yx- XY, SD. (X) = oy=\ - D(x XY 


: 1 as = 
V(Y) = Variance of Y = 7 LY YY, S.D. (Y) =oy = Al 27) 
Substituting Cov(X, : Oy and Oy in (i) we have, 
= ty X- Y~ 
(X- X)(Y- ¥) Y_ 


p= = et 
.. (ii) 
a [2 EX- xy 1 yY- ye VE(K- XyVaY- vy : 


On simplification, we get 
r= 5 5 = ... (iil) 
fndX — (ZX) ndY ~(zYy 


This method is also known as direct method. 
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‘ferent formulae for calculatin : 
Diffe g Karl Pearson's Coefficient of Correlation 


1. Product moment method: + = 2(X— X) (Y¥-¥ se 
(xX —X) X(Y— ¥y fae fey ... (iv) 
where, x and y denote the deviations of X an 


: ; d ¥ fr ir ari % y 
respectively, i.e. x= X—X, y= Y~— ¥. This meth om their arithmetic means X and Y 


od is known as Actual mean method. 
Direct method: r= Se eer Va 2XEY 
[nzX ~ (2X) n¥ - (xy/ 

3. a actual rie of X and Y are in fraction, the calculation of Pearson's correlation 
coefficient can be simplified by taking deviations of X and Y values from their assumed means 
A and B respectively. That is U= X— A and V= Y—B, when A and B are assumed means of X 
and Y series. The formula (iii) becomes as given below and known as short cut method. 
Assumed mean method (i.e. Change of origin) 


= nZUV — (XU) (= 
aes [n=U? — (SU 2 7 .- (Vv) 
n&U? — (ZUy \fnBV? — (Vy) 
4, Step deviation method (i.e. Change of origin &scale) 
nxU'V'— (LU) (ZV) ; 
ioe 2 3 2 2 (ul) 
\{n=U —(2U) af[n=V' -(2V) 
X—A Y—B 
where, U= oe v= |] h = Common factor for variable X, 
A = Assumed mean of X-series, k = Common factor for variable Y, 


B= Assumed mean of Y-series. 


3.3.3 Interpretation of Correlation Coefficient 


The degrees of correlation coefficient according to Karl Pearson's formula's are as follows: 

pee arenes [Positive ___—(|_—Negative—— 

Very high degree of correlation 

High degree of correlation 

Moderate degree of correlation 

Low degree of correlation 
[less than+0.25__| 


Very low degree of correlation less than — 0.25 
No correlation 


3.4 Properties of Karl Pearson’s Correlation Coefficient 
1. Correlation coefficient (r) lies between -1 and +1. Symbolically, -1 <r<+ | 
2. Correlation coefficient is independent of change of origin and scale. . 
Mathematically, if x and y are given variables and they are transferred into new variables u and v by 


; 2 ok 
change of origin and scale viz. y = us -< gv= Sat 


Where 4, B. h & k are constants, 4 > 0, k > 0; then the correlation coefficient between x & y is same 


as correlation coefficient between u & v, r(x, y) = (U,V) LE Py = Mv 
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3. Correlation coefficient is the geometric mean between two regression coefficients ie.r=+ Vo nb, 
where, 3), = Regression coefficient of regression line of y on x 
by = Regression coefficient of regression line of x on y 

Note: Sign of correlation coefficient 'r’ is determined according to the sign of TegTession, 


- coefficients. If both regression coefficients are positive then 'r is also positive and if boy, 
regression coefficients are negative then 'r' is also negative. 


_MEXY-ENEZY , _nEXY—ENZY 
= ner cxy? 92 nty’~(eN)’ 


4. risa relative statistical measure, so it is a pure number independent of unit of measurement. 


5. Two independent variables are uncorrelated but converse may not be true i.e. two uncorrelateg 
variables need not be independent. 


BUTLER 


Find the Karl Pearson's Co-efficient of correlation from the following data 


Solution: Calculation of Correlation Co-efficient ee moment method) 


20 


aa >_ zy _ 19 
We have X = ss = 4 an and Y=" ="5 = 3.8 


~9 
r= ace = Y20-yi28 = — 0.5624 [Using actual mean] 


Alternatively, (Direct method/using product moment formula) 


= 


ae 5 x 67-2019 
fn BX (EXP. n BY — (eye 5100-00" Wsxugig Ot 


There is moderate negative correlation between_X and Y. 


Cae 
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and Demerits of Karl Pearson's Coefficient 


eritS 
Merits: 
It is based on all the observations, 
», Iindicates magnitude as well as direction of linear correlation bet i 
There is no chance of personal bias in its computation Core 


4. Itis the best method of computing simple linear correlation 


Demerits: 
3, It is applicable only when the correlation between the variables is linear 
b, _ It is the affected by the extreme values. . 


¢, _ Its interpretation is not an easy attempt. 


3.4.1 Probable Error (P.E) 


Probable error of the correlation co-efficient is the measure of testing the reliability of the calculated 


value of 7. It is generally denoted by P.E. (r). If 7 be the calculated value from a sample on 'n' pair of 


observations. Then P.E. (r) is given by 


P.E. (r) = 0.6745 x fe 
n 


The probable error of r may be used to determine the limits within which the population correlation 
co-efficient lies. Limits (range) for population correlation co-efficient are r + PE. (r). 

Another use of P.E. (r) is to test whether value of sample correlation co-efficient is significant for 
any correlation in the population, for these following results arises: 

i) If|r|<P.E. (#, then r is not significant at all. 

ii) If|r|>6 PE. (7), then r is definitely significant. 

iii) In other situations, nothing can be concluded with certainty. 


Example 3.2 | If correlation co-efficient of 10 pair of observations is 0.4. Test whether the value of r is 
significant or not. Also compute the limits within which the population correlation co-efficient may 


be expected to lie. 
Solution: Here, n= 10 and r=0.4 


I=r 1-04? _ pers t= 248 -o.18 
Wehave, PE. (r) = 0.6745 x7 = 0.6745 ig = 0.6745 «Ti 
and 6x PE, (7) = 6 * 0.18 = 1.08 | 
drawn with certainty. 


We see that neither |r| < P.E. (r) nor ir| > 6 x PE. (r). No conclusion can be 


Now, Limits for population correlation co 
rtPE, ()=044018 ie, 


: ’ 
Example 3.3 If sample size n is 50, variance of X is 9, S.D. of Fis 


-efficient are 
0.58 and 0.22 
4, then covariance is 9.8, find Karl 


Pearson's correlation coefficient. aed 
Solution: We have, n=50, variance of X= 07, =9, SD. of X= Ox= 3, S.D. (Y) = Oy 
Cov (X, Y) = 9.8 
Now - Cow(X, ¥) _ 9-8. _ 0.82, (High degree of positive correlation) 
’ —4 = 4 = U.62. 
Oy Oy x 


4 


wv 
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following results obtained between two variable 
8, 


the coefficient of correlation from the 


Compute 
res Variable X Variable Y 
mime 


Arithmetic mean 

Sum of squares of deviations 

from arithmetic mean 
Summation of products of deviation of variables X and Y from this respective means is 46. 


Solution: In the usual notations, we have given 
n=7,X=4, Y=8, (XX) = ir = 28, 


YY- YP = = 76, D(X -X) L(Y — Y) = Lxy = 46 


So, eae 
fax Vd 4/28 7 [76 
There is very high degree of positive correlation between two variables X & Y. 

: 
Find the Karl Pearson's correlation coefficient for the following data: 

pe a a te fe a ee ee ee 
10 
a ae 


Use: a i iati 
) Direct method b) Deviation from assumed mean method 
c) Deviation from actual mean method 


Solution: a) Direct method 


= 6.997 


LAY — 
Son lePlar > 5(298) — (30) (45 
There is very high ‘i ae ae 5(220) — (30)' /5(425) — (45) ee 
b jhe ry high degree of positive correlation between tw i 
) Deviation from assumed mean method 0 variables X & Y. 


Let, As — Ts ae = 
| 


nZUV— XU SV Bae 
— (10) (-5S 


"ns Got. SS 
n&U ~ (SUP 5 ee IE) COGS) 
U) Vor — (ore 5(60) — (10) 5(25) ~(-5) = 0.99 
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peviation from actual mean method 
Computation of Correlation Coefficient 


pee ae ee SNP 
ad 
oma 
4 
Ex = 30 | 


Lxy = 28 | 
gare iO, =p _2Y_45 
Here ee LS ee 
Pe”: Aenea ae 
Now, a Sx. ry = 40 20 = 0.99 (Very high degree of positive correlation) 


Ifr = 0.5, Xxy = 120, 6, = 8 and Xx’ = 90, find the number of items, where x and y are 
deviation from their respective means. 


Solution: We have given, 


r= 0.5, Zxy = 120,0, = 8, Ex’ = 90 | sx-x-Fand y= y-Y, 1 sr- 7P=6,| 


1 = 1 
Given, \ | Y— yy=8 => ay = 8 => LYy= 64n 


Then Karl Pearson's correlation coefficient between variables X and Y is given by 
120 


specie OE ae 
an aN ee CO 
120 
> (0.5) = EER [-: squaring on both sides] 


(120) 
> = 10. 


= 90 x 64 x (0.5) 


ee aa Oe 
Height of Son |S | 


Solution: Let X be a variable of father's height and Y be the variable o 

Computation of Correlation Coefficient 
Height of | Height of - 
Father (in| Son (in |x=X-X|y 
inches) | i 
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= 2x 320 = LY 335 =, 
Here, Xa as = 66, ye = "5 67 


i Ps ee nf 
Ep v1 


There is very high degree of positive correlation between X & Y. 


[Example 3.8 | Find the Karl Pearson's correlation coefficient between two variables X and Y from 1) 
pairs of ame Fin ene 
DV = 30, LY =5, LX’ = 670, LY’ = 285, ZXY = 385 
Solution: Number of observation (”) = 12 
LY = 30, EY = 5, DX* = 670, LY” = 285, LXY = 385 
nNuXY — XX. ZY 12 x 385 — 30 x 5 =0.91 


re nS" — (EX -fna¥’— (BY)! V12 * 670-30” 12 * 285 - = 


There is very high degree of positive correlation between X & Y. 


Calculate the Karl Pearson's correlation coefficient for the following data of sales and 
expenses in thousands of rupees of 5 firms. 
Solution: Let X be the variable of sales and Y be the variable of expenses in '000' Rs. 
Let Assumed Mean of X-series (A) = 41; Assumed Mean of Y-series (B) = 
— of Correlation Coefficient 


Sales (X) 7 X-41 


Therefore, The Karl Pearson's correlation coefficient between sales (X) and expenses a is given by 
nxU V-XU- XV 5.x (~ 8) —(— 5) x (-4 
re ndU? — (SUY ndP (EVP V5 * 159-5). [5x94 —- (4p: as 


There is very low degree of negative correlation between sales and expenses. 


Example 3.10] Compute the coefficient of correlation from the sca data: 


Solution: Let Assumed mean of X-series — = 300 and h = 10; 
Assumed mean of ¥-series (B) = 165 and K = 5 


U' and V’ are obtained as uA and yates 


Karl Pearson's correlation coefficient between_X and Y is given by 


nxu'V' — (2U') (ZV') 


6x 11-7 x3 


f= pee fe ee 
a[n=U" — (ZUY n=V? — (ZV 6x 261-7 6x 59-3 = 0.89. 


There is high degree of positive correlation between number of items and no. of defective items. 


Example 3.11 | The following table gives the distribution of the total population and those who are 
wholly or partially blind among them. Find out if there is any relationship between age and 


blindness. 


No. of Blinds: 


50-60 | 60-70 | 70-80 
\No.ofstudents | 100 | o | | 36 | [nu [ 6 | 3 | 


Solution: Here we have to compute Karl Pearson's correlation coefficient between age (X) and blindness 
of the given total population in each age group (Y). Where blindness is obtained as percentage of 
blind students among the total number of students of the respective age group. 


55 
40 
40 
40 
36 


22 


40 
40 * 100 = 100 
40 
— = 7 =] 1 
36% 100 = 111.11 =11 


36 - 
x4 * 100 = 150 
22 ¥ 

7 100 = 200 
18 

6 


— x 100 = 300 


"3 * 100 = 500 
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Computation of Correlation Coefficient between age and blindness 


Karl Pearson's co-efficient of correlation is given by 
nU'V'— dU'- LV" _ 8 x 2251-4 x (-117) = es 
 nZU" —(cUy nx? = (LVY [8 x 44 — (47 V8 = 15935 -(-l it oe 


Hence, there is very high positive correlation between age and blind. 


Compute the Karl Pearson's coefficient of correlation from the following data by the Kar 
Pearson's method 
#29 [30 [38 [40 [32 
Also (a) Calculate its probable error 


(b) Interpret if the value of r is significant or not 
(c) Determine the limits within which the population correlation coefficient may be expected! 


Solution: Let X be the price of tea and Y be the price of coffee in Rupees. 


Computation of Correlation Coefficient 
V=Y-38 
—3 
1 
70 
Te 
2 


Karl Pearson's correlation of coefficient is 
UV-XU. 
Se - 8 x 241 -(-11) x (-13 0.9733 
nXU? - (LU) . n= = (SVP 8 x 207 -(-1y’ .8x295-(-By 
a) Probable error of correlation coefficient is given by 


P.E. (r) = 0.6745 x -z 661s = ners d 
n 
8 


= 0.0125 


lie. 


Co i 
rrelation and Regression Analysis 


To test significance of r 
6 x P.E. (r) = 6 * 0.0125 = 0.0753 

Since, F is much greater than 6  P.E. (r), the value of ris highly signifi 

Limit of population correlation coefficient ec 


r+6%* PE. (r) = 9.9733 + 0.0753 = (0.8980 1.048) 


computer while calculating ae 
A P lating the correlation coefficient between two variants X and Y 


from 25 pairs of observation obtained the following information: 
n= 25, EX = 125, LX” = 650, ZY= 100, LY = 460, ZXY = 508 


It was however, discovered later at the time of checking that it had copied down two pairs of wron 
observations as g 


Y 


eA 
as While the correct values were Ese 
Obtain correct value of the correlation coefficient between ae 
Solution: We have given, 
n= 25, ZX = 125, TX’ = 650, ZY = 100, ZY’ = 460, ZXY = 508 
Now, Corrected LX = 125-6-8+8+ 6 = 125 
Corrected ZY = 100 —-14-6+12+8=100 
Corrected EX® = 650 - 67-87 + 8° + 6° = 650 
Corrected ZY? = 460 — 147-6? + 127+ 8° = 436 
Corrected ZXY = 508-6 x 14-8 x6+8x 1246 8 =520 
Corrected r is 


= nEXY — (EX) (EY) . 25 x 520-125 100 a 
mS? — (EXP VnbY - (ZY) 25 « 650 — (125)- [25 x 436 — (100) 


There is moderate positive correlation between X&Y. 


Example 3.14 From the following data, find out if there is any relationship between density of 


orrelation between density of population and 


Soluti, 
i the formula, 


d Here, we have to compute the coefficient ee a d death rates using 
“ath rate, Therefore, we should first calculate density an 


hs 
No. of deals , 1000 
Density = Population and Death rate = population 
rea 


Calculation of Corr’ 


120 24000 288 


150 75000 1125 15 


130 
1 


8 


2370 


Then, poe - 23 _ gg 
Var.\fzyJoaso00-frn8 


48000 768 “30 =6 


50 40000 | 720 | so =8 
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d Death Rate 


elation between Density an 


i ; No. of 
24000 _ 


Death rate 


x 1000 = 12 


x 1000 = 15 


x 1000 = 18 


x 1000 = 13 


There is high degree positive co i 
relation between i 
population density and death 
rate. 


Example 3.15 | The fol : 
e following are the monthly figures of advertising expenditure and sales of a firm. Iti 
sales of a firm. It's 


generally foynd that advertising expenditure has its im 
p 


Allowing for this time-lag calculate coefficient of he on sales generally after 2 months 
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son: Allow for a time-lag of 2 months 


Mare “ ew 
ae d Y be two variables of advertising expenditure (in lakhs) 


i.e. link advertisi i 
vertising expenditure of January with sales for 


and sales (in lakhs) respectively. 


Computation of Correlation Coefficient 


= 


fo} 
i=} 
n 
~ Wa 


= 

<< 

Lonel 
N 
oO 


oH 
p 
B 
Mn 
Oo 


oa 

=] 
—_ 
> 
oO 


—s 
aa 
oO} oO 


Bu 
as 


There is high degree of positive correlation between advertising expenditure and sales. 


34.2 Karl Pearson's Correlation for Bivariate Frequency Distribution (Two 


way Frequency Table) 

When the number of observations in a bivariate distribution is fairly large, then in order to facilitate 
the calculation of correlation coefficient, the data are often classified according to two measurements ina 
WO-way frequency table called a bivariate frequency table or bivariate frequency distribution. In this 
“stibution, the values of one variable are kept in rows and values of another variable are kept in 
“lurans. This values can be discrete or continuous. The frequencies for each cell of the table are 


determined by tally bar or tally marks. i 
The correlati on coefficient of bivariate distribution is computed by using following formulae: 
Direct method 
= NESEY Ci) EAN 
Shorte ee Oy — 
nt method (Change of origin/Deviation method) 


p= NEfUV—(EfU) Cf), where, U=X-A &V=Y-B 
NEU? fUy NE SV 


1 2 0 7 d Statistics fi or B CA 
A exitboo of. T oba bili y an 


Step deviation method (or Change of origin and sca 
nN=fUv'—(fU LfV. 


2 n2 

r= (nsfu2-(afuy ne fv? - (24) 
. X-A er a 
where, N = total frequency oleae oa an a 


i ¢ = n of variable Y 
4 = Assumed mean of variable X B = Assumed mea 


= i liable Y 
h = Class size of variable X k = Class size of varia 


Steps ing and another is 


. is i mn head ; 
List the class intervals of two variables Xand Y, one is in colu In row 


heading. - 

b. Calculate mid-points of class intervals of variables X and Y and then take aeons (OF step, 
deviations) from their assumed means which are denoted by U and V (or U" and i 
respectively. 

c. Foreach class of X, add the frequencies of total cells. Similarly for each class of Y. 

d. Multiply the frequency of X variable, with the corresponding value of U and the products ar 

summed up to obtain Lf U. Similarly we obtain 2fV. 

e. Again multiply fU with U and fV with V to obtain f U? & fV’ and then obtain If Ur & LfV. 


f. Multiply f, U and V of each cell and write the figure so obtained in the right-hand corner of 
each cell. 


g. All the values in the top corner are added to get the last column (or row) fUV to obtain LfU. 
Substitute all sum of values in formula to calculate Correlation Coefficient '7’. 


| Example 3.16 | Compute the coefficient of correlation between income and expenditure from the 
following bi-vatiate table. 


Income Rs. 


500 - 1000| 1000 - 1500 | 1500 - 2000] 2000 - 2500 
1200 ~ 1600 - 


ee 
Ea a A ie ae Ta 


Solution: Let X-series and Y-series be the expenditure (in Rs 


Assumed mean of X-series (A) =1250 


Expenditure in Rs. 


400 — 800 


Lic eel 


-) and income (in Rs.) respectively. Als? 


Size of class interval of X-series (h) = 500 
Assumed mean of Y-series (B) = 1000 


Size of class interval of Y-series (k) = 400 


Then, U’ and V’ are obtained as 


,_ X-1250 Ze 
U ~ 509 and y= = 1000 


400 


C , 
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Income 
mid value 
Mid value | f fv fre UV! 
ep | () ; 
200 | -2 
0-400 2 ? . 
6 52 104 60 
| 
00] 600 
400 30 ~30 30 aH, 15 
sw | 1000 -;| 4) ao | o] o 34 
120 8 10 2 4 ? . 
1200- | 1400 pf | =! o| ee ee 
1600 1 10 2 1 a i 
1600- | ygoo | 2 2 g a) | 2 
2000 . i . 7 6 2 
14 f 
gee | 33 33 11 9 | Sf=100 | sfv=—s6| =fv? = 172 |Sfuv=94 
fu' |-28| —33 0 iu.) ig. | ee 
S| 32 
Ue 33 0 11 36 | sfUP=136 
fuv'| 52 | 29 0 1 12. | spuv=o4 
Now, Karl Pearson's co-efficient is given by 
_N2=FUV'— (fu) (LFV 100 x 94 — (—32) x (-56 


"= nafu" —(efUy | NEfV— Gf -V100 «136 — 32)" .of100 x 172-36 


There is moderate positive correlation between income and expenditure. 


Example 3.17 | Calculate the coefficient of correlation from the following bivariate frequency distribution. 
Also, test the significance of r. 


Advertising Expenditure in Rs. 


; 10000- | 15000- 
5000-10000 | | so00 20000 


75-125 | 
125-175 
175-205 | 
225-275 | 
Slution: Let X-series and Y-series be the sales revenue (in Rs. la 
thousands) respectively. Also 
Assumed mean of X-series (A) =150, 
Assumed mean of Y-series (B) = 12.5, 


Then, U' and V' are obtained as 


Let yi - = 150 ,_ Y=-125 


khs) and advertising expenditure (in Rs. 


Size of class interval of X-series (2) = 50, 


Size of class interval of Y-series (k) = oP 
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[5-10 [10-15]15-20] 


Adv. Exp (Rs. in '000 


; Mid 
Sales saa in Lakh onlard Ke 
s. 


17 |ZfU" = 45| spy 


=] 


Karl Pearson's coefficient of correlation is given by 
oe NXFfUV'-~- FU FV _ 40 (21) — (-17) G0 = 0.596 
fNzfU" — (afuy VNDfV? -(afvy2 40 (45) - C17) 40 (50)-(10y 


2 
Probable Error (P.E.) = 0.6745 nid = 0.6745 4 ee ~ 0.69 


Since |r| < P.E., coefficient of correlation (r) between sales revenue and advertising expenditure’ 


not significant. 


Example 3.18 | Following figures give the ages in years of newly married husbands and wives. Represet! 
the data by a bi-variate frequency distribution. 
(Age of husband, age of wife): (25, 17), (26, 18), (27, 19), (25, 17), (28, 20), (24, 18), (27, 18), (28, 19), (2s 
18), (26, 19), (25, 17), (26, 18), (27, 19), (25, 19), (27, 20), (26, 19), (24, 17), (26, 20), (26, 17), (26, 18) 
Also, find Karl Pearson's correlation coefficient and examine the significance of calculated value: 
Solution: Let, X and Y be the ages of husband and wife respectively. We observe that the variable X take 


the values from 24 to 28 and ¥ takes the values from 17 to 20. We obtain the bi-variate discr* 
frequency distribution given as below: 


ces 
ete 
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On's correlat 


| 


Calculation of Karl Pears 


ae] 25 [26 [37 [8 


Xx _ : , 

, -2 0 2 TH 
Y _ f | fv | fie | f | 
0 7 ‘ie eee eee ue. Juv | 

se ‘ | L , 5 5 | 5 | 
0 0 + 1 | 


ns ae _ 0 2} |4| 
20) 2 ee a ee ee ee ee 
jul 4 {-s| 0 | 4 [4] sha 

0 


| 0 | 4 [8 | sptaas 

fe]! 2 | 2 | o | 4 [6 | Spare | 
Here, N= 20, Xfu=—-1, Lfv=7, Uw = 25, Sf" =23, Tfuv = 14 

go OT Pe 20% 14-1) «7 7 

VNE fu’ — (Zfuy VNEfY -(Bfry (20 x 25 — (1°20 x 23-7 

Thus, there is positive relationship between husband's age and wife's age. 

2 2 
ara = 0.6745 x a 0.0903 
6PE = 6 x 0.0903 = 0.5423 


Since r is greater than 6 PE; the correlation coefficient is significant i.e. there is evidence of 
correlation coefficient. 


0.633 


Now, -PE =0.6745 x 


343 Coefficient of Determination 
The square of correlation coefficient is called coefficient of determination. It is denoted by r. 
‘efficient of correlation measures the degree of linear relationship (association) between two variables 
“nts whereas the coefficient of determination measures percentage of total variation in one variable has 
*<0 explained by the variation in the other variable. In other words, the coefficient of determination 1s 
as the ratio of the explained variance to the total variance. Thus, 


Explained variance 
Total variance 


P be fficient of determination is highly applicable in regression a 
8€ of total variation in dependent variable has been explained by 


Suppose 

Prose, r= 0.82, 7 = 0.6724 1 dent 

wa Mplies that 67.24 % of total variation in dependent variable has rea — cal 
“and remaining (100- 67.24) = 32.76% of the variation due to other factors. 


: 5 direction of the 
in : i cg h aia “ive. so it does not tell us about the 
"inst C0-efficient of determination is always positive, 5° 


Ww i $ d co-efficient of non- 
it i iti i n the two variables. An ‘ 
hether It 1s positive or negative betwee - ai 


Total variance . 


Coefficient of determination, 7° = 
nalysis in order to measure the 
the independent variable. 


tee. 
Tt + < F = _~pes 
mation is usually denoted by # and is given by K° = on 
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3.5 Rank Correlation 


oblem, we may be faced by 


: the problems of computing correlation between the var; 
In practical pr 
e. For examp 


: ; : 
which are not quantitative in natur le, correlation ia ee ‘smarts 
among a group of students. Here the variables honesty and aes oe ae Measure, 
These are qualitative in nature. But ranking is possible in case of qu ; | 

Karl Pearson's co-efficient of correlation cannot measure the a cal the variables Which 
are qualitative in nature. For this situation, British psychologist Charles ein oe develope is 
formula to obtain the correlation co-efficient between the ranks of the variables under the study, Thi 
formula (method) works for both quantitative and qualitative variables. 


Able, 


1 Methods of Studying Rank Correlation Coefficient 
There are three cases while computing Spearman's rank correlation coefficient: 
Case (1): When the Actual Ranks are given 
When the actual ranks are given then the following steps have to be followed: 


(i) Find the difference of ranks d=R,-R. 
(11) Compute d to get xa 
(iii) Then, find the rank correlation coefficient by using the formula: 


6a 


Ralf 


where, 
R, = Ranks of items of one variable, R, = Rank of items of second variable, 


{ 
n = Number of pair of observations, d= R, — R, = Difference between the pair of ranks. 


[Example 3.19 | Ten industries of some state have been ranked as follows according to profit eared » 
and working capital for that year: 


Industry 
Rank of Profit 


Rank of Working capital 
Solution: Computation of rank correlation coefficient 


Industry Profit rank (R Working capital 
Industry | y rank (R, 
B 2 


I 
2 
ee eee Ieee ee Pee oat 


6nd 
Bato 7 0X56 _ 
nw? —1) = 1 T9100 1 = 9-782 


There is high degree of positive correl 


ation between the ranks of profit and working capital. 
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Fxample 3.20. le 3.20] Ten competitors in a beauty content are ranked b . , 
the jud 5 : 
First Judge 1 6 5 es — cs i in the following orders: 
7 8 


Second Judge 


Third Judge 


Use the method of rank correlati fici 
ation coefficient i i i j 
nearest approaches to common taste in beauty a i a aca 


solution: To find out which pair of judges has the neare ach t 
compare Rank correlation between the judges i i ei al a 


a) 1% Judge and 2™ Judges 

b) 1 Judge and 3“ Judges 

c) 2" Judge and 3™ Judges 

Let R, = Ranked by 1“ judge, R, = Ranked by 2" judge, R; = Ranked by 3“ judge 
Computation of Rank Correlation Coefficient 


n=10 
Rank correlation coefficient between the ranks of first and second judges 
6ddy , _6%200 
Rs ley) or 
Similarly, rank correlation coefficient between the ranks of first and third judges 
6 Yds 6x60 
Rey ode 


Rank correlation coefficient between the ranks of second and third judges 
6 dos" 6 x 214 
Ry =1 0p 1) 1 - Toe — 1) - oor 


n(n — 1 
Since, the value of Rj; is the highest, the pair of fist and third judges has the nearest approach to 


common taste in beauty. 


The coefficient of rank correlation of marks obtained by 10 students in Statistics and 
Accountancy was found to be 0.2. It was later discovered that the difference in ranks in the two 
Find the correct value 


subjects obtained by one of the students was wrongly taken as 9 instead of 7. 
of coefficient of rank correlation. 
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Solution: We have, R = 0.2,” 


We know, 


or, 


= 10 


n— 


6 Xa’ 
R=1-F@ 1) 


6 La 
‘ono 


Then, 


Example 3.22 


company was + 0.8. If the sum of the squares 0 


Corrected Ed’ = 132 - (9) + (7) = 100 


Corrected R = 1 - n(n —1) 


Solution: Here, R = 0.8, Zd’ = 33, n=? 


We have R=1 - Spe 

or, won 1-0.8 
or, aoe = n(n — 1) 
or, 990 =n? —n 
or, n—990-n =0 

or, n—1000—n+ 10 =0 

or, n’—10°-(n— 10) =0 

or, (7—10)(°+ 10+ 100)—(n- 10) =0 

or, (n—10)(n’+10n+ 100-1) =0 

or, (n— 10) (n? + 10n + 99) =0 

or, n=10 

or, n’+10n+99 =0 


or, 


Pee tof toy —4, 1.99 _=10+-J-296 


2.1 2 


Case (2): When Actual Ranks are not given 


When the actual 


ranks. 


Then, the following steps have to be followed: 


(i) At first, rank the given observations in an 
(ii) Find the difference of ranks d=R,-R 
95 


(iii) Compute @” to get S@ 


iV : 
(iv) Then, compute the rank correlation coefficient by using the formula: 


_ 6xa* 
we] ~ n(n 1) 


6 x corrected Sd” 1 
———-— say 


values of the variables are given but not the ranks, it will be necessary to assign 


6 xd 
=> 02=1- 707007 - 1) 


1 


6 x 100 
5 = 0.394. 
10(10° — 1) nd 


The coefficient of rank correlation between the debenture prices and share prices of 
f the difference in ranks was 33, find the value ofp 


6 x 33 
=> 08=1 nr — 1) 
198 
> Ge 


990 = n(n?-1) 
9x 10 x LL =(n-1) xnx(ntl) 
n=10 


=10 


ascending or descending order. 
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where, 
d=R, — R2= Difference b 
etw i 
R, = Ranks of items of one (abe nna ane 
R, = Rank of items of second variable 
n= Number of paired observations - 


ple 3.23 | Calculate the coefficient o lati 
Grample 3.23 | icient of correlation using Spearman rank | 
m correlation coefficient 


between supply and demand given in the following table. Also com 
. ment on your result. 


1986 


145 


ze 


tion: Let R; d 
Solutio : 1 cia the Rank of supply (X) and R,denotes the rank of demand (Y) 
omputation of Rank Correlation Coefficient | 


Supply (‘000 Demand (‘000 
tons) tons) 


| 9 | 
|| ed=0 
Rank correlation coefficient is 
ee 6yP 0 x216 _ 
=F a a a6 


There is low degree of negative correlation between the ranks of supply and demand. 


Case (3): 
When the Ranks are Repeated (Tied ranks case) 
Ke two or more than two values of a variable are equal in any classification with respect to 
= ergs (attributes), in this case, common rank e repeated items. The common 
: is the average (i.e. Arithmetic mean) of the ranks. 
or example, suppose an item is repeated at rank 3 (i.e. 3 


assigned to each item is ; +4 = 3.5 i.e 3.5 and the next value is assigned by the 


s are given to th 


place) twice, then the common rank to be 


rank 5. Similarly, 


: T+ 
e common rank to be assigned to each value will be-~~37— = 8 


following correction factor or 
fficient i.e. Correction factor 


thrice at rank 7 (ie. 7™ place), th 


sa the next value is assigned by the rank 10 a 
justment factor is made in the spearman's 


nd so on. As a result, 
rank correlation coe 
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mn — ae is added to La” in the rank correlatio 


d value in both variables. Hence, fo 
following formula: 


na mined, mene —D mle, 


R=1- n(n’ — 1) 


n formula. This correction factor is to be'aq 
added ¢ 
{yy 


each repeate r repeated ranks, the Spearman rank ¢ 
coefficient is given by the OFTELati, 


Where 77, 12,3 ... denote the number of times that an item is repeated. 


Example 3.24] An examination of 10 applicants was ta 


applicants in Statistics and Accountancy Papers. Calculate the rank correlation coefficient f : 
TOM the 


following data. 
Fr ee oO Co 
Messin sams —[3e [ar for far fe [ss Pas Trt n 
ae [Marks in Accountancy _| 4g | 39 | 38 | 36 | 58 | 6! | 72 | 83 | 61 
olution: 
Let R; d isti 
, denotes the rank of marks in statistics (X) and R; denotes the rank of marks in Accountancy (J) 


ken by a firm. From the marks obtained 5 
Y th 


oO} & 
to] — 


2 


Here, n= 10, m, = 3, m, =2 m;=2 


Rank correlation coefficient is given by 


Eat + Mou = 1) Ss 
RD wen 1) 
2 


R=1- 
’ n(n = 15 
“i q 11638) , 22*—1) | 2@?— 1) 
- 10 La’ 
(10"- 1) = 1—0.7212202) 


There is low degree of positi 
Accountancy. 
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When R =+1, then it indicates com 


lete : 
variables (attributes). p agreement in the order of ranks between the two 


= —l, then it indi at i tw two 
d. When R ns cates complete disagr i } 
tes ( ni 3). greement in the order of ranks between 


Merits and Demerits of Spearman rank Correlation Method 
Merits of rank correlation are as follows: 


It is appropriate for qualitative data like as honesty, efficiency, intelligence, beauty ability etc. 


b. This method is simple to understand and easy to apply as compared to the Karl Pearson's method. 


a. 


c. This is the only method that can be used where we are given the ranks and not the actual data. 
Demerits of rank correlation are as follows: 
a. This method cannot be used for finding out correlation in a grouped frequency distribution. 


b. When the number of items is large, the calculations become quite mind-numbing and time 
consuming. 


Distinction between Karl Pearson’s coefficient of correlation and Spearman’s rank correlation 
coefficient 


Karl Pearson’s coefficient of correlation 


Spearman’s rank correlation coefficient 


1. Karl Pearson’s correlation evaluates the linear|1. Spearman’s correlation evaluates _ the 
relationship between two continuous} monotonic relationship between two 
variables. A relationship is linear when a| continuous or ordinal variables. In a 
change in one variable is associated with a] monotonic relationship, the variables tend to 
proportional change in the other variable. change together but not necessarily at a 

constant rate.The Spearman’s rank correlation 

coefficient is based on the rank values for each 
variable rather than the raw data. 


_ Karl Pearson’s correlation is often used to]2. Spearman correlation is used to evaluate 
evaluate relationship involving continuous relationship involving ordinal variables. 


variables. 
3. But Spearman’s rank correlation can not be 
used. 


3. Karl Pearson’s correlation coefficient is used 
inear|4. When the variables are not normally 


in regression analysis. 
distributed or the relationship between the 


. It measures the strength of the : 
relationship between normally distributed \ 
variables is not linear. It is more appropriate 
to use Spearman’s rank correlation. 


variables. 

The Spearman correlation coefficient is 
defined as the Pearson correlation coefficient 
between the ranked variables. 


. Pearson’s correlation coefficient between|5. 
variables is defined as the covariance of the 
two variables divided by the product of their 

Standard deviations. 


3.6 Regression Analysis 


The theory of regression analysis was first dev 
7. The literal or dictionary meaning of the word 


eloped by British Biometrician Sir Francis Galton uy 

“Regression” is “Stepping back” or “Returning ee 

- " ion" i f his research on heredity (on estimating the 
ards the average. He used the term "regression in acti “ de paar ie ee Goa 


Nature of , ; . d sons : 
; relationship between height of fathers an a hejaht of 
“Prings having ies ed or tall parents tend to “regress” or step back” towards the average heif 


istics for BCA 
A Textbook of Probability and Statistics fi 


ion. But the te 


ed in Statistics 1s only a convenient term Withoy 
business and economics. In Statistg 
5 where two OF more variables have 


130 
ulat 
general pop , 
having any reference to biometry: 

the concept of regression analysis 
tendency to move back to the averag 
s 

Galton studied the average relation 

of regression. = 

In statistics, regression analysis is con 


a . 
variables. Regression explains me nee irae? he given value of others variables, 
ope ; 
ion i “ tion or prediction 0 
regression Is the estima 


Iysis, there are two variables dependent and independent. The ies whose va 
_ In regression rtd edicted is called dependent variable. It is also known as regresse¢ or Predict 
is nee aoe on a other hand, the variable which influences the value of peat sheds or 
oe <i value is used for prediction or estimation of dependent amie eo a Ath den 
variable. It is also known as regressor, OF predictor or explainator. Thus pre ation jg 


possible in regression analysis. . | 
Prediction or estimation is an activity. For example, estimation of future production, consumption, 

prices, sales, profits etc. Regression analysis is one of the very scientific techniques for making such 

predictions which are paramount importance to a manager, decision maker, businessman or economist. 


as now us 
it is widely used in 
e to all those field 


rm “regression” 
Nowadays, 
is applicabl 
e behaviour. 


between th ically and called the ling 
hip betwee 


ese two variables graph 


e of average relationship between th 


ith the measur 3 
cerned wit variables. Thus, it can be said tha 


f relationship between 
bles value from t 


3.6.1 Uses of Regression Analysis 
The tools of regression analysis are definitely more useful and important in statistics. Some of the 
important uses of regression analysis are as follows: 
i) Regression analysis helps in establishing relationship between dependent and independent 
variable. 
ii) Regression analysis is very useful for prediction. For example, prediction of sale, profit, 
income, population etc. 
iii) “A very important branch of economics, called 'Econometrics' is solely based on the techniques 
of regression analysis. 
iv) The average and correlation co-efficient between two variables can obtained easily by using 
the regression lines. 
v) In social and economic field, it is used for projection of 
Status, planning etc. 
vi) In the business field, it is widely used. For ex: i 
: . . ample, busi i i 
- production, consumption, investment, prices, stale fee Ronee ele ee 
vu) Itcan be used to estimate unknown value of ’ 
7 other variable (independent variable) which 
vii) Regression analysis helps to explore cause 


3.6.2 Types of Regression 


(i) Simple Regression 


population, birth rates, death rates, marital 


a variable (dependent variable) from the given value of 
are interrelated. 


and effect relationship between variables. 


; ; € simple sin . : 
diagram lies almost ea regression is called linear regression, if 
& a line, otherwise it js termed as non-liné 


an dm ta te wer of) 
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it is not in a straight line then it is called non-linear 

jon equation will be a functional relationship — betw. 
higher than one, i.e. involving terms of the type X” 
inear regression between two variables only. 


cen X and Y involving terms in X and Y of 
degree Y’, XY, etc. However, in this chapter we will 


discuss | 


3.6.3 Comparison between Correlation and Regression 

There is no doubt the fact that there are some differ 
correlation and regression which are as follows: 
Correlation Regression 


: eee Statistical tool]1. Regression analysis is the mathematical 

> is used to study or measure of the average relationship between 
describe the degree of relationship to which] _ two or more variables in terms of original units 
the variables are linearly related. of data whether the variables are linearly 
a2 related or non linearly related. 

2. It does not necessarily imply cause and effect/2. It necessarily shows (indicates) the cause 
relationship between the two variables under and effect relationship between the variables 
study. such that the cause is taken as independent 

variable and effect is taken as dependent 

variable. 

3. Correlation coefficient i.e. ryy is a relative] 3. Regression coefficients b, & b,, are ratio 
measure of linear relationship between two measures. so, regression coefficients have 
variables and is independent of the units of| units of measurement of the variable. 
measurement. 

4. Correlation coefficients are symmetric 4. Regression coefficients are not symmetric. 
Le. ryy =ryx i.e. b + b,, 

5. Correlation analysis is confined only to the|5. Regression analysis studies linear as well as 
study of linear relationship between the non linear (curvilinear) relationship between 

the variables. 

But regression coefficients are ratio measures 

and if one of the regression coefficients is 

greater than unity (one), the other must be less 
than unity. i.e. b,. x by <1 

Regression coefficients are independent o 

change of origin but not the scale. 

Regression coefficient is expressed in the 

units of dependent variable. 

The regression line can be used for 


prediction. 


ences between the statistical techniques 


1 


w 


variables 
6. Correlation coefficients is a pure number/6. 
lying between -1 & +1. 


7. Correlation coefficient is independent of|7. 


Regression Lines 


A line of regression gives the best estimate of 0 
gression gives the bes ’ ‘ egres 

© variables x and y are under consideration, there are two paeen a ate the value of x for a given 

*tY On x and other is the line of regression of x on y. The line used to . im pace neta 

Value of y is called the regression line of x on y. Similarly, the line aa be = ee naeiasesnueNte 

biven val ; wae f y on x. The regressio 

ue of x the regression line of y 

ee is called the reg 


iven value of the other variable. 


ne variable for any g f th ) 
sion, one is line of regression 


It tw 


for BCA 


and Statistics 


regression lines will be coincideny, 
between the regression lines wil| nee ¢ 
from | to 0. If two regression ite x 
(0. The regression lines are determing, 


A Textbook of Pr obability iables, the 
P > la ’ 

of perfect correlation between os ae The angle 

In case ion lines will be coincident. ‘ 

win al umerically decreases 


the correlation co-efficient n ) Sean 
ables have correlation co-eflicie 
Cc 
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angle between th 
from 0° to 90° as 
perpendicular, the pair of vari . 
i inci east square. i 
by using the principle of leas . — 
Note: The regression lines ofy onxand xony intersect at the p ( 


i i ession Co-efficient | 
Regression Line and Regr (linear equation of two variables) are known ‘ 


“an lines i i ic expression : 
The regression lines in terms of algebraic exp hich gives the best estimate ofoue tarda ? 


. : ; ion is the line w 
the regression equations. Line of regression 1s th lables X'S ¥ ; 
any given value of the other variable. In case of simple regression only two varia are Studied. [p 


YX and Y are two variables, the algebraic expressions of regression equations In terms of X and Y are called 


regression equations (lines). 


3.7 Types of Variable 
There are four main types of variables: 
(i) the dependent variable, 
(ili) the moderating variable, and 


(ii) the independent variable, 


(iv) the intervening variable. 


3.7.1 Dependent Variable 

A variable is called dependent variable if its values depended upon the other variable(s). The 
investigator's purpose is to study, analyze and predict the variability in the dependent variable. What 
would be the result in the dependent variable if certain changes appear in other related variables? The 
investigator is interested in measuring this variability in the dependent variable. Hence, the variable that is 
used to describe or measure the problem under study is called the dependent variable. A few examples of 
the dependent variable are as follows. 


3.7.2 Independent Variable 


A variable is called independent variable if it is not influenced by any other variable under study. It 
however influences the dependent variable either positive or negative leads to changes (increase 0 
decrease) in the dependent variable. In other words, any change in the dependent cee. due to change 
in the independent variable. Thus, independent variables are those which are used as the basis al 


You would be interested in finding out how cha i i 
nges in the inde i the 
dependent variable. The examples of independent variables are oo i i 


Regression Equation of Y on X 
The regression equation of Y on Y which describe 

Y for given changes in independent variable XY. So 
the best estimate for the value of Y for any specifie 
Since, the line of regression is the lir 
with the principle of least Square which 
errors of estimates i.e. the deviations 
Corresponding estimated values as give 


s the variation in the values of dependent variable 
the line of regression of Y on_Y is the line which giV 
d value of X. 


line o 
Re fit. so, the term ‘best fit’ is interpreted in accordan 
in minimizing the sum of squares of the residuals of! 


between the 9; 
: Ziven observed ; thei! 
N by the line of best fit values: @f the: watiable: and 
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The sum of squares of the errors either parallel to Y-axis or parallel to X-axis may be minimized. 


€ equati 
arallel to 
The re 


Y- axis. 


gression equation of y on x is given by Y=a+ bY 


‘on of the line of regression of Y on_X is obtained by minimizing the sum of squares of errors 


' . 
where ‘a’ and 'b' are constants or parameters to be determined to find the position of the regression 


jine- 


jine 28 Show? below. 


Let regression equation of Y on X be 
y=atbx 
The lest square normal equation, 
Ly =na+ bxyY 
XUXY = aLX+ bux’ 
where, 
Y = dependent variable, 
X= independent variable 
a= value of Y when X= 0 
= Intercept of the line (Y - intercept) 
b= byy= regression coefficient of Y on X= slope of line 


=rate of change in Y due to unit change in XY 


(i) 


... (ii) 
_.. (iii) 


The parameter ‘a’ determines the distance of the line above or below the origin and 'b' the slope of 


By using the techniques of least square, the parameters ‘a’ and 'b' can be obtained by solving two 


XY na 


equations. 
From equation (i), LY=nat+bx°UXY => Pte 
LY 4 
=> —=atb— 
n n 
a=Y-bX 


Putting a= Y — bX in equation (ii), we get 
SEXY =(¥ —bX).2X+ b LY 


LX 
= Sxy == 5+ 5EY 


1.€. byy =e? — (LX)? 


Altematively, the regression equation of Y on X is Y— “"Y =byx(K- X) 


2x 


PAX; Yi) 


Hx, a + bx) 


S=atbx 
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Regression se acai ae! wae which describes the variation in the values of dependent V 
The oneal ST ewiledk variable Y. Thus, the line of regression of X on Y is the ling 
at oc eae for the value of X for any specified value are iia sada es 
Since, the equation of the line 0 cael ee Square; 


of errors parallel to X- axis 
Then, the required equation of the line 0 
on X be x=atby . (i) y 
The least square normal equation, 
xX =na+ bxy ... (il) 
EXY = aLY+ DEY’... (iii) 


Whig, 


f regression of Xon Yiso 


f regression of X on Y becomes. Let regression equation ofy 


where 
X = dependent variable, Y= independent variable 


a =value of X when Y=0 


= Intercept of the line (X - intercept) 
b = byy = regression coefficient of Xon Y=slope ofline O 
= rate of change in X due to unit change in Y 


By using the techniques of least square, the parameters 'a' and 'b' can be obtained by solving two 
equations 


from equation (i), EX =na+b3yy=== 4, bat 
ae 2A, BY. 
substituting die b 7, equation (ii) we get 
ax . &Y 
XXY -|* 574 ]sy, b Xi? 
ZX-XY | (ZY\2 
=> XXY = ei (2) +bxY2 
n=XY-XXXY  [nSY - (sy) 
eg ee 
2 LAY —- EX.Y 
nZY?-(LYy 
ie. py <DAYAEKEY og 


n&Y-(syp =" S, 
the regression equation of X on Y 


In above regression equations by, and b 
respectively. Both the regression co-efficient 


efficient determines the correlation co-efficie 
of the regression co-efficient must be less tha 


Alternatively, : v 

is X— X = b,, (Y— Y) 

9 - called regression co-efficient of Y on X and X on 
always have same algebraic sign. The regressio a 


nt by the relation r = Vb, x Dye Since ir| <1, so the produ 
nor equal to 1. i.e. byy - byy< 1. 
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513 properties of Regression Coefficients 


tet Xand Y be two variables and two regression coefficients Y on X i.e. byyand X on Yie. byy. Then 
The correlation coefficient is the geometric mean between the regression coefficients. 

OR 
Geometric mean between two regression coefficients is equal to the correlation coefficient 


Mathematically, r= +, /b,,. i, 


If one of the regression coefficient is greater than unity (one), the other must be less than unity 
because bry byy S 1. 

sie mean between two regression coefficients is greater than or equal to correlation 
coefficien 


ie. 5 (on +b, )2r 


1. 


4. Regression coefficients are independent of change of origin but not of scale. 
_f=8 
k 


Symbolically, U = s 


k h 
Then byx = 7, - byy, & byy, & byy = - byy where a, b, h(>0) & k(>0) are constants. 
5, Bothregression coefficients must have same sign. The sign of correlation coefficient is same as 


that of regression coefficients. 


6. Regression lines (equations) always pass through their mean values (X,Y) which is also the 
intersection point of two regression lines. 


7. Ifr=+1, the regression lines become identical. 
8. Ifr=0, the regression lines are perpendicular to each other. 


Example 3.25 | Find the equations of two line of regressions if the following results obtained for 5 pair of 
observations. , 


xX = 15, LY= 18, EX? = 55, LY? = 74, ZXY=58 
Solution: The regression equation of X on Y is given by 


x =a+bY...(i) 
The values of X and Y is obtained using principles of least square as below 

LX =na+bxY 

or, 15 =5at 18b (i) 
and UXY =aLY+bx4Y 

% 58 = 18a+74a -~- (ii) 
Solving equation (i) and (ii), we get 

a= 1.45 and b = 0.43 

Putting the value of a and b in equation (i), we get 

‘ = 1.45+0.43 y. 

Also the regression equation of Y on Xis given by (i) 


Y=a+bx 


veg 7 : 
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db are obtained using principle of least square as below. 


The values of a an 


SY =nat b xX 
18 = 5a + 156 0 Af) 
or, 
And EXY = a EX + bEA va 
58 = 15a+55b ++ (ii) 


or, 
Solving (i) and (ii),we get 
a=24 andb=0.4 


Putting the value of a and b in equation (ii), we get 


Y =2.4+0.4x 
. The lines of regression are X = 0.434 + 1.45 and 

Y=04x+24 

=. BY 5 ae 
Alternatively, X =a =57 3and Y= ear is 3.6 


Regression Co-efficient of Y on X is given by 
| nX=XY-SXEY 5x 58-15 18 
ba = “EX ap 5x 55-5 4 
and regression co-efficient of X on Y is given by 
_nXXxXY-ZXZY 5 x 58-15 x 18 


be = ey (Yr = 5x 74—-(gy 70-43 


| 
| 


Now, regression equation of y on x is 
. Y¥—Y =by(X¥-X) 


=> Y—3.6 =0.4 (X- 3) 
> Y—3.6 =0.4X-1.2 
. Y=04X+4 2.45 


And regression equation of x on y is 
X-X =byy(Y- Y) 
X—3 =0.43 (Y-3.6) 
> X-3 =0.43Y- 1.548 
 X =0.43Y+ 1.45 


Example 3.26 | Given that x and y are correlated variables. Ten observation of value of (x, y) have the 


following results. Ex = 55, Sy = = i 
a » 2V = 95, Lxy = 350, Dx? = 385. Estimate the value of y when the value ; 


Solution: We have to find the value of y when x = 


i ; 6, it requi : : r this 
regression co-efficient of y on x is given by aS serestioa eanatisttot Ons © 


by, = PEK _ 10 «350-55 x 55 
n Xx ~ (2x? 10 x 385 ~ (55) = 0.58 


. The regression equation of y on x is given by 


Y =by, (x- ¥) 


‘e’ 


Solution: 
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or, y—5.5 =0.58 (x— 5.5) 


y =0.58x + 2.31 
Now, when x = 6, the estimated value of y is 
y = 0.58 x 6+ 2.31 = 5.79 


maple 3 27| From the following data 
samp — 


Demand of commodity (suitable units) 


Co-efficient of correlation = 0.66 


i) Find the equation of regression lines. 
ij) Estimate the likely price of commodity when quantity of demanded commodity is 75. 
solution: Let x denotes prices of commodity and y denotes quantity of commodity demanded. 
Here, x = 36, y = 85, 0, = 11, 6, =8 andr =0.66 
i) Regression co-efficient of y on x is given by 


Oa ie et 
bye = PG = 0.66 X77 = 0.48 


and Regression co-efficient of x on y is given by 


Now Regression equation of y on x is 
y-y =by.(x-X) 
y—85 =0.48 (x — 36) 
wy =0.48x + 67.72 
and regression equation of x on y is 
x-x =by(v-Jy) 
_x—36 =0.91 (— 835) 
. x =0.91y — 41.35 


tl) To estimate the value of x when the value of y = 75 putting y = 75 in regression equation of x on y. 


ie. x= 0.91 y—41.35 =0.91 x 75 — 41.35 = 26.9 


. The most likely price of commodity is Rs 26.9 when the quantity of commodity demanded is 7°. 


° . 

Xample 3.28] Find correlation co-efficient and regression equations for the following data. 
iam: [BTS | 7, oe, se [Ts | 

Blood Pressure: | 125 156 107 | 136 


Let x denotes ages and y denotes B.P. 


Again let Assumed mean of x-series (a) = 80 


and assumed mean of y-series (b) = 130 


| 
| 


lines 
36 
49 
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36 
ao 


n day — 2 BV = = er =a : IE (36) 
= _ x waa Va “ 
re indie — (Sup xn EP — (Evy V8 x 1463-43)" * 


| For Regression lines, 
| 2H _ 80 +2) _ 74,63 


= 0.9618 


| x= or 
(-36) 
and 7 =b+== 130+ 3 = 125.5 


Regression co-efficient of y on x is given by 


Fey: 1 = - 
_ nzuv—Xu-Xv _ 8 x 1707 — (-43) * ( 36) _ 199 


a n&=w—(Zuy? 8 x 1463 — (+43) 
Thus, Regression equation of y on x is 
Y—Y = dy (x x) 
aq = y—125.5 = 1.22 (x— 74.63) 
“ y =1.22x + 34.45 
Again, Regression co-efficient of x on y is given by 
by = YEU Eu _ 8 «1707 — -43),-36) 
9 nyv (Xu > 8x2172 — (36) = 0.75 


and Regression equation of x on y is 


> x-x =by(y-j) 
> x— 74.63 = 0.75(y — 125.5) 
> x = 0.75y — 19.49 


[Example 3.29 | Find the two regression equations from t 


he followi . ye themat!® 
obtained by 8 students. Owing marks in Statistics and Ma 


: | | hen the marks in Statistics is 65, 
1i) Find correlation coefficient between marks in Statistj 
ics 


a 


golution® Re 
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gression equations of Y on X and X on Y are given by 
Y—Y =byy(X-X) 
X-X = by (Y- ¥) 


Calculation of regression equations 


Y gen Ye V om : 
6 9 
2 4 
4 | 
| | 
0 0 
3 1 
3 
8 16 
= = 36 

Now, 
= DX 480 aa 
= 2X 80 | Go 7 2 23 _ 79 


Regression coefficient of X on Y 


24 


byy = BF = 35 = 0.75 


Regression equation of Yon .X is given line 
" y-¥=by(X-X) 


or, Y—79 =0.667 + (X- 60) 
or, Y =79 + 0.667X — 40.02 
or, Y = 39.98 + 0.667X 


Regression equation of Xon Yis given line 
X-X =by(Y-¥%) 
X-60 =0.75 (Y —79) 
X =60+0.75Y- 0.75 x 79 
X =60+0.75Y- 59.25 


uo Yu d 


£ =0.75 + 0.75Y 


i) When X= 65, / = 38.98 + 0.667 x 65 = 82.34 
ii) p=fbyy x Dy = ¥0.667 * 0.75 = 0.707. Since both the re 
byyare positive, so r is also positive. Le. r= 0.707. 


There is high degree positive correlation between marks in 


Statistics and marks in Mathematics. 


gression co-efficient byy and 


for BCA 


140 A Textbook of Probability and Statistics 


[Example 3.30] 


Arithmetic mean 


Standard deviation 


ation: 


Given is the following inform 


| 


. 8 
Coefficient of correlation between X and Y is 75 


Find a) The regression coefficient of Yon X and X on Y 


b) The two regression equations. 


c) The most likely value of Y when_X = 100 rupees. 


Solution: We have : 
Mean of X = X =6, Mean of Y= Y=8 


40 

SD. of X=o.=5, §.D.of Y=O,=%3 

8 

Coefficient of correlation between X and Y(r) = 75 
a) Regression coefficient of Y on X is 


oy _ 8 40/3 


bate is” 5 = 1.422. 


Similarly, regression coefficient of X on Y is 


by = eats x aE = 0.20 
b) The regression equation of Y on X is 
Y-¥ =b,,(X-X) 
or, Y-8 = 1.422 (X-6) 
or, Y =8+ 1.422X- 1.422 x6 
or, ¥ = 1.422X-0.532 


Similarly, the regression equation of X on Y is 
X-X =b, (Y¥-Y) 


or, X-6 =0.2(¥-8) 
“a X =6+0.2Y-0.2x8 
or, §=02¥+44 


N 
c) When X= 100, Y= 1.422 x 100- 0.532 =Rs, 141.67 
The mostly like value of Y is Rs. 141.67. 


In a partially destroyed record, the following data are available: 


Variance of X = 25, the regression lines are 
Sx — Y= 22 and 64X—45Y=24 
Finda) Mean value of X and Y. b) 


Coefficient of Y. 
c) Standard deviation of Y. of correction between X and 


> 


y 
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) The regression equations of X on Y and Y on X are 
5x- Y =22 (i 
we 
64X —45Y = 24 ik 
.. (il) 


gince, both the lines of regression pass through the mean values, the point (X,Y 
ne (i) and (ii). We get s, the point (X, Y) must satisfy 


olution: 


§X-- ¥ =22 as 
as = a. (il 
64X —45Y =24 : a 
Multiplying equation (1) by 45 and subtracting equation (iv) from equation (i), we get 
925 X -45Y = 990 
| 2 — 
64.X —ASY = 34 
ce + = 
| 161X + 0 = 966 
| 
| — 966 
we ae =e 


Substituting the value of X in equation (iii), we get 
5x6—-Y=22 
or, ¥ =30-22=8 
Hence, the mean of values are X= 6, Y=8 
b) The regression equation of X on Y is 


5X-Y=22 
| or, SX =22+Y 
22 1 
or, X=Gt5-¥ 


Comparing this equation with Y= a+ bY 
1 


byy = 5 
Similarly, the regression equation of Y on X is 
64X—45Y =24 
or, 45Y =-24+ 64X 
-24 64 
or, Y= a5 +45 X 
Comparing this equation with Y= a+ bX 
64 
byy = 45 


Hence, correlation coefficient between two variables X and Y is given by 


64 1 
r=t\|byy- byy=+ 45° 5 = + 0.533 


| Since, both regression coefficients are positive, 


Hence + = 0.533. 


yr must be positive. 
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oy = 25 then 0; = 5 


c) Wehave, 
Oy 
we know, byx=lG. 
64 Gy 
or, 45 = 0.533 x 5 


64x5 
or, Oy =45x9533 7 13-333 


The correlation coefficient between two variables X and Y is r = 0.60. If 

X = 10, Average of variable Y =20, 
Coefficient of variationof Y =10, 

The Most likely value of Y when x= 


Average of variable 
Coefficient of variation of X = 15, 
Finda) The regression equation of Yon X b) 
Solution: a) The regression equation of Y on X is 


oF =by(X=D) -- (i) 
Here, r= 0.60, ¥ = 10, Y =20, C.V. (X) = 15, C.V. (Y) = 10 
For variable X, 
CV. (X) =e 100 
_ Ox _ 15.10 _ 
= 15 =79 x 100 > x= 199 =1.5 


For variable Y, 


The regression equation of Y on x is 
Y—Y¥ =by(X-X) 


- Y -20 =0.8(X~ 10) 
=> Y¥ =20+0.8¥-0.8 x 10 
=> ¥ =12+0.8x 


b) When X= 18, the mostly likely value of Y is 


Y =0.8 x 18+12=1444+42= 26.4 


Regression Equation for a Bivariate Fre 

The Calculation for obtainin 
correlation co-efficient 'r' from a bivari 
However, since the regression co-effic; 
so they are computed as, 


quency Distribution 
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x-a 


fet ead k 
i cl then bye =F bry and by == byy 


a= assumed mean of x-series, b 


where = assumed mean of y-series 


h = common constant or width of classes of x— 
k = common constant or width of classes of y —Series 


_nifuv-Xfurfy k n Xfuv — Xu. Xfv 
thats P=" Eft —(Sfuyp “a & b=" a 


Family income and its Percentage spent on food in the case of hundred families gives the 
| following bivariate frequency distribution. 


Food 
expenditure in % | 200-300 | 300-400 | 400-500 | 500-600 1600-700 


7 


Rs aca Se ne ae a ae) 
ae a ae ee 
| 6 ee 
jo: | le seuss 


series 


oe 
k 


ees as ae 
i) Find regression co-efficient, ii) Find regression equations 

| iii) Estimate the income of a family whose food expenditure is 21% 

iv) Calculate correlation co-efficient also test the significance of r. 

| Solution: Let x denotes food expenditure and y denotes income of family. Also let assumed mean of X— 
series (a) = 22.5 and assumed mean of y-series (b) = 450. We have, 


p= 5 and k= 100, then un =2—2 and ya? 


100 
y | 200- | 300- | 400- | 500- 
300 | 400 | 500 | 60 


450 [55 


600— 
700 


fons 
wa 
OQ 


f 


= 
N 
w N - 
' os 
S E 
= 
Ne 
E = 
Ss 
a ~ 
2 


8 


Ss 
wn 
z= 
di 
E 
if 
Bir: 
io.) 
SS 
oO 
s 
: 
2 
> 


EE 
fc 
E 


1 
So 
| 
N 
i) 
Oo a 


Hd Bc Hel 


40 | Lf?= 
120 
Lfuv =- 
48 


o 


2 34 


SETEEE 
= 
l 
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i) 


li) 


iii) 


iv) 
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i -effici fy on x is 
Regression co-efficient 0 
NEfuv —Zfuxfy _ k _ 100 —48 = 050 , IP _ 9.6 
Os =" NS fue — (Zfuy *n4= 100 x 100-0 


and Regression co-efficient of x ony is 
Ne fuv -Xfurfv h _ 100% —48)-0 x0 e >. = -0.02 

Py = NE fv — (Sfvy *k =  100x 120-0 

y = dy (x— X) and x— x =by(v-y) 


Regression equations are y — y — 
Zhu y y= 22.5 pe e220 


We have, x =at "Hy 100 
Liv 0 ~ 
a 7 = b+ 2h x (= 4504799 * 100= 450 


Regression equation of y on x is y— y = by, (x— X) 
y—450 =-9.6 (x — 22.5) 


y =—9.6x + 666 
And Regression equation of x on y is 
x-x =b,(-y) 
x—22.5 =-0.02 (y— 450) 
=-—0.02y + 31.5 


Here food expenditure of family (x) = 21% 
To estimate likely income of family, putting x = 21 in regression equation of y on x i.e 
y =— 9.6x + 666 =— 9.6 x 21 + 666 = 464.4 
Estimated income of the family is Rs. 463.5 
We have, Two regression co-efficient are 
by, =—9.6 and b,, =— 0.02 
Correlation co-efficient (r) = Vb xy * Byx = V9.6) x (0.02) =-0.43 
Since both the regression co-efficient negative, so correlation co-efficient is also negative. 
Le. r= -0.43 
Also, to test the significance of '’ 


1-r7 ~ ? 
PE. (r) = 0.6745 x = 0,6745 x 0.43 = 0.055 


IN 4/100 
Since |r| > P.E.(r), 6P.E.(r) = 6 x0.056 = 0.329 
Thus r is significant because Ir| > 6P.E. (r). 


It the correlation co-efficient between two variables x and y is 0.9, then what is the p 


efficient of determination? Also, interpret it. 


Solution: Here, correlation co-efficient (7) = 0.9 


We have, co-efficient of ination = 2 = fades m ‘ation 
nt of determination = r = (0,9)? = 0.81. This implies that 81% of the varia" r 


dependent variable has been explained by the ; , 0 
at e ind 9% 0 
variation is due to the other variable. “pendent variable and the remaining 17” 
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7 rrelation betw 70 Vari 
| sa ie 335" 4 V is 0.3 mae thy h ables X and Y is 0.6 and correlation between other two 
} y * ] an eas it mean t at fi i . 
\ variables Tst correlation is twice as str i 
ong as th ? Giv 
an g e second? Give 


olution: Here, 0.6 and correlation between U and V is rpy = 0.3 


No, it does not mean that first correlation i.e., correlation between X and Yj 


naee : ; is twice as strong as the 
ie. Corn 
second correlation elation between U and V because correlation co-efficient seat only 


i i twe' ari 
jinear relationship between the variables. To compare, the correlation, it is required to obtain 


percentage of variation of dependent variable explained by the variation of independent variable 


Example 336 The data gives the marks in statistics and Mathematics obtained by 7 students. Find i) 


‘ i ‘O- ient il) By what lariat : : 
( orrelation Cc efficien y percent the variation of marks in S ics is due to variation o 
: tatist d 7 f 
| marks in Mathematics ? 99 a 


Correlation between X and Y is ry = 


576 
625 


729 


Total | Ex = 164|Zy = 304| Exy =7155| Ex? = 3868| 3)? = 13246 


i) Karl Pearson's co-efficient of correlation is given by 


: n Sxy— Xx - Sy 7 7 x 7155 — 164 x 304 heres 
re in Sx — (Ex? Jn EF — (yy? -V7 * 3868 - (164) x [7 x 13246- (3042 


ii) Co-efficient of determination = 7 = (0.9757) = 0.952 
Hence 95.2% variation in marks obtained in Statistics is explained by the marks obtained in 


Mathematics. 


Theoretical Questions 
1. What is meant by correlation? Write the measures of correlation between two variables. 
2. Define correlation. Explain various types of correlation. 
| 4. Does the degree of correlation between two variables si 
relationship between the variables? Explain it. 
4. Define Karl Pearson's co-efficient of correlation. What are the special ch 
Correlation co-efficient? 


gnify the existence of cause and effect 


aracteristics of Pearsonian 


; / ‘ent? Write its uses. 
- 6 What do you mean by probable error of correlation co aor from Karl Pearson's correlation 
| ' Define Spearman's rank correlation co-efficient. How is it different trom 
°0-efficient? Discuss. 
: : What is co-efficient of determination? writes its uses. 
} le : . 7 i i ae 
‘ | What is regression analysis? How does it differ from correla ‘ 
} 9 figs Wa USES 


* What do you mean by regression? Write its uses dea 
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10. Define regression co-efficient. Write its properties. 


11. Differentiate between 
i) Positive correlation and neg, 
ii) Partial correlation and multiple correlation. 


iii) Linear regression and curvilinear regression. 

ion co-efficient. 

explaining the constants used in it. Algo ‘i 
e 


ative correlation. 


iv) Correlation co-efficient and regress 
12. Write down the equation of the two regression lines, 
the properties of regression co-efficient. 
13. Define scatter diagram. State its merits and demerits. ; 
14. What do you mean by least square method? Explain it. 


Exercise 3.2 


Numerical and Practical Problems 
1. From the following data, ascertain with the help of satterdiagram, whether the income an 
expenditure of the workers of a industry are correlated or not. 


210 | 215 


Average expenditure (in Rs): 
Z. Find Karl Pearson's Co-efficient of Correlation, when 
i) Cov(X, Y)=10, Var (X) = 6.25 and var (Y) = 13.36 
ii) X=25, Y=18,X0(X-—X)= 136, (Y- Y)*= 138, 
X(X— X) (Y— ¥) = 122 and N= 15 
ili) n=10, Xx = 18, Ly = 25, Xx? = 90, Ly = 120 and Uxy = 65 
iv) n= 10, Zxy = 120, 6,=3, and o, = 8 where x = (¥—X) and y=(Y-— ¥). 
3. Find the Karl Person's Correlation co-efficient between x and y if the observation (x, y) are follows 
(9, 8), (8, 10), (6, 9), (5, 7), (10, 5), (6, 6), (4, 2), (3, 0), (2, 2), (1, 1) 
4. Calculate 'r' for the following data: 


NO 
ss 
A 
NO 
NO 
iS) 
N 
(eS) 
j=) 
N 
w 
n 
wo 
aes 
nN 


5. Find the co-efficient of correlation from the following data: 


Also draw the scatter diagram and interpret it. 
6. Find the Karl Pearson's co-efficient of correlation between the sale and expenditure of a firm for six month. 


hss Jan Feb 
Sale (in '00'): 


Expenses (in Rs'00') : 


i 
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rs the following table gi in di ri f amon 
Z the Ist ibution 0) boys and | I I p i 

also egula' football ayers g 

V h | y . y 


" shem according to age group, 
t group, find out correlation between ‘age’ and 
€ and playing habit 
of boy. 


Age (in years): 

No. of boys: pee 

No. of regular players: ma 270 
162 


9, (a) The following table gives the distributi 
: ution of th 
partially blind among th i : ane 
g them. Find out if there is any relation bet 
ween age and blindness 


rage dayeas): [0-10 [10— 
100 ele 50 | 50 — 60] 60 — 70 | 70 


population and those who are totally or 


No. of attendance 
No. of successful students 


10, With the following data i Fe 
aaa a ; aan 6 Localities, calculate the co-efficient of correlati 
e density of population and the death rate Se ees 


Population 
('000') 
30 


No. of deaths 


11. Co-effici ; 

aaegenie of correlation between X and ¥ for 20 items is 0.3, mean of X-series is 15 and that of Y 

> mais deviations are 4 and 5 respectively. At the time of calculation, one item 17 was 
ngly copied instead of 27 in case of X-series and 35 instead of 30 in case of Y— series. Find the 


c . 3 
orrect co-efficient of correlation. 


1A ' ' 
computer while calculating the correlation co-efficient between the x and y obtained the results. 


N=35, 2x=120, Y= 550 
Sy =105, 3y=500, 2x = 350 


of checking that it 
(13, 12) respectively obtain the co 


If 
ie however, later discovered at the time had copied down two pair of items 
eat (1251) instead of (7, 8) and rrect value of the 


co ; ; 
trelation co-efficient between x and y. 
sband (X) and wife (Y) for 


e related with the ages of hu 
cient of correlation between 


Following ; 

ies information's given below ar 

i ed couples living together in a sample survey. Calculate the co-effi 
age of husband and that of his wife. Test the signi 


N=72, Efx = 3560, Lfx? = 196800, Sfy = 3260, Uf 


ficance of calculate value ofr. 
= 168400 Xfxy = 172000. 


oa 
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is x e slaty > AP ores 
F the data given below find the co-efficient of correlation between the drivers age and 
14, From the give : . 
sr of accidents made by them. , 

number of “ seman | 

Dae —— | 

| Number of 1 25-30 |30-35 | 35 —40| 40 - 45 | 45 - 50 | 

! = : - me : : | ; 7 

| | - -- 9 | 4 | | 


tet cage: ye 
od [PQ | mee 

te jw 
rt 

“dO | tn 

S 

| | 

bot 


4 an eee ——) 
15. The following table gives the distribution of sales (in Rs. 00) and profit (in Rs '00 ) of 100 Shon 
Find the co-efficient of correlation and its probable error. Also state whether correlation CO-effic.- 

is significant or not. 
Profit (in Rs. '00') 


| Sales (in Rs '000') 


650 — 750 

750 — 850 

16. Correlation co-efficient between the ages of fathers and sons is 0.9031. Discuss if the value of: ; 
significant or not. Also compute the limits for population correlation. 


17. Ten students were examined in Accountancy and Statistics. The ranks obtained by the students x: 
given below. Find the Spearman's rank Correlation co-efficient. 


Ranks inf 1/2]3 1/415 7 
Accountancy: 


[Ranks in Statistes: | 2] 4 [1 [5 [3 [9 [7] 


tancy pape 


i) Which pair of judges disagree the most? 
ii) Which pair of judges has the nearest approach to commontaste of beauty? 


22. 


27. 


3. 


Calculate the co-efficient of correlation between VY 


* From the following data 


beautiness and intelligence of 10 girls was found to be 


Find the correct correlation co-efficient. 


Ifr=0.5 then find the co-efficient of determination and interpret the result 
the result. 


and Y. 
=a 7 6 | 5 i a i i ce ee 
Pe | Ss 3 2 

£2. | a mca | ea 


Also find the percentage of variation explained. 
Find the regression lines for the following pairs (x, y) for the values of . 
(1, 6). (8. 1), 3. 0). (2, 0), 1, 1), 1, 2), (7, 1), (3.5) 

From the following data, obtain the two regression equations 
‘Sales (in Rs.): 9] {97 | 108 
Purchase (in Rs.): 71 175 169 


The following table gives the ages and blood pressure of 10 women. 


age: [56 [42 [36 [47 49 [a2 [60 [72] 
| 


i) Find the correlation co-efficient between age and B.P. 


v and y. 


150 


ii) Determine the least square regression equations. 
iii) Estimate the blood pressure of a woman of age 45 years. 


The advertisement expenses and the sales of a product are recorded as below. 
|Adv. exp. (Rs. '000') 
Sales (Rs '000') 
Estimate the sales when advertisement expenses is Rs. 15,000. 


The following data gives the experience of machine operators in years and their performance as 
given by the number of good parts tumed out per 100 pieces. 

| eta 
isp fief a [3 [ols | | 
rar es [ 89 | | 78 | 80 | 75 | 8 | 


i i i having performance of 158.87. 
Find regression equation and estimate the experience of the operator gp 


Nn 


and estimate Y wh = 36. : 
Ina Survey of hits ; se pairs of husband and wife, the following is recorded. 
Age of husband: 28 i 15 
Th pie [is [opr [TS TE importation of his wife's age, predict 
e is 30 years. He deny ; : - their ages. 
€ cha il nathan ened same degree as elation ie a eee . aie sci tile 
°*f observations obtained the following results N’= ani 
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hat the computer had ¢o); 
: h Pleq 
I discovered at t 


150 _ A Textbook of Provan e time of checking t 


SEXY = 39815. It was however late 


down two pairs as 
i eee 
EC eens (05) 


Calculate the correc 
i) Test whether the value of ca 


ii) Find regression co-efficient of 


f correlation & 


t values of co-efficient 0 
-efficient of correlatio 


Iculated co 
XonY 


n is significant or not. 


| iii) Find regression co-efficient of Y on X. 

iv) Find regression line of Y on X. 

v) Estimate Y when X=65. 
32. Given the information: 

Sum of x = 5, Sum of Y=4 

Sum of square of deviations from mean of X= 40 

Sum of square of deviations from mean of Y=50 

Sum of the products of deviations from the means of Xo. 
i) Find regression co-efficient of Yon X and X on Y. 


n Y= 32, no. of pair of observations = 10. 


ii) Find Pearsonian co-efficient of correlation. 
33. While calculating the co-efficient of correlation between two variables X and Y, the following results 
obtained. 
N=25, XX= 125, LY= 100, UX? = 650, LY = 460, TXY= 508 
Later it was found that two pair of observations (X, Y) were copi ime of 
! > pied (6, 14) and (8, 6) at the time ol 
checking while the correct values were (8, 12) and (6, 8) eee ee cial 


| i) Correct values 
ii) Correct Correlation co-efficient 
iii) Correct equations of the lines of regression 
| - aus oe and interpret the significance of the co-efficient of correlation. 
ata, the mean value of X is 20 and mean value of Y is 45 
of Yon Xis 4 and that of Xon Yis . 
Find 


The regression co-efficie™ 


i) Co-efficient of correlation 


li) The standard deviati i 
i) ion of X if standard deviati 
Viat i 
li) The two regression equations Cope 


| : iv) Estimate the value of X when Y= 25 
| . You are gi ing i me 
given the following information about profit and 
it and Sales: 


Profit (Rs. in lakhs) Sale (Rs in Lakh ) 
S 


nt 


: Correlation and R i, 4 
)) Find the regression co-efficient eames it 


36. 


37. 


38. 


39. 


40. 


41, 


ii) Find the equations of lines of regression. 
iii) Find the estimated sale when profit is Rs. 15 Lakhs 
iv) What should be the profit if a company wants to attain sales target of Rs. 120 Lakh 
; akhs. 


Out of the following two regression lines, identi 
regression of y on x. Why? 


2x + 3y-—7=0 and 5x = 4y-9 =0 
The equations of two regression lines obtained in a regression analysis are as follows: 
3x + 12y—19=0 and 9x + 3y — 46 = 0, obtain | 
i) The means of x and y 
ii) The regression co-efficient of y on x and x on y. 
iii) Correlation co-efficient between x and y. 


fy the line of regression of x on y and line of 


For 50 students in a class of BBS, the regression equation of marks in Statistics (y) on the marks in 
Economics (x) is 5x — 4y + 8 = 0. Average marks in Economics is 44 and the ratio of standard 


deviations Oy : Ox 1s 5: 2 find the average marks in statistics and co-efficient of correlation between 
marks in two subjects. 


The two regression lines are given by 
3x + 2y =6 and 7x + 5y= 12 
i)  Indentify the lines of regression 
ii) Estimate y when x = 10 
iii) Calculate the co-efficient of correlation between x & y. 
iv) What percentage of total variation remains unexplained by the regression equation of y on x? 
Find the correlation between the two variables from the following bi-variate frequency table. 


} j Marks in English 
Marks in Science 50-75 | 75-100 | 


a= 50 
ese i ae 
a eae 


es ose ee 
eae a ee 

a student who secured 95 in English. 

their height and weight. 


Also estimate the marks in Science of 
Following is the distribution of students according to 
1) Calculate the two regression co-efficients 

ii) Obtain two regression equations. 


43. 


45. 


| 44. 


enses. 


a of sales and exp 
s and interpret it, 


culat 
(a) Calcul fficient betw 


rrelation Coe 
on expenses On sale 


(b) Compute the co 
(c) Test the significa 
(d) Develop the estimating eq 

Estimate sales if the promotiona 
h parameter of t 


curately a new J 
k is to look at th 
t. A sample 0 
ob performance 1 


eee ol a ee 

Salary “000” Rs. 6 [25 [33 [15 [28 [19 |20 [22 | 
ee - estimating equation that best describes these data and estimate the salary of an employee 
whose job performance is 10 and 2. Al ists signi i i 
me so find whether there exists significant relationship between 
A computer while calculatin i i 

g the correlation coefficient b 
of observation obtained the following results: eS eae yee eee 
n= 8, EX = 562, X° = 39602, ZY= 2 

, 561, ZY = 39815, ZXY = 39441 


It was however, disco 
: vered later at the ti i i i 
tog sae ime of checking that it had copied down two pairs of 
airs of wrong 


ation that e effect of promotl 
ua 

| expenses is Rs. 20,00U- . | 
he equation; in terms of above information. 
ob performance index measures what js 


e relationship between job evaluation 
f eight employees was taken and 
ndex (1-10, 10 is best) was 


(e) Explain the meaning of eac 


nterested in seeking how ac’ 
y to chec 


gnificant or no 
d rupees and j 


A researcher is 1 
important for corporation. One wa 
index and an employee’s salary is Si 
information about salary in thousan 


collected. 


(a) Test whether calculated coefficient of 

| Regression coefficient of Xon y 
7 Regression coefficient of Y on Xx 7 
Regression line of YonX | 

(e) Estimate y when Y= 65 


While calculating the 


X=8 and | 
25 Y= 10 
10 Y= 12 and 7 WwW 
and 8. Obtaj &fe copied 
on X and final] blain the correct Wrongly, the ¢ 
Y find out Whether ge t°S and th vtesponding co 


t th 0 
P the : 
regression equation of Y on y and 


ere exists 
an ; 
y relationship betwe 
en these t 
WO vy 


ariables or nol- 


| 


| 


q The income and expenditure of 100 families js given belo 
- CiOW: 


Expenditure (Rs.) 
500-1000 | 1000-1500 | 1<0q 


1500-2000 _ 
1000- 2000 " 


2000-3000 19 

3000-4000 

Find (a) Two regression coefficients 
(b) Coefficient of correlation between income and expenditure 
(c) Mode value of expenditure and income. . 
(d) Coefficient of variation of expenditure and income. 

47. From the following bi-variate table. 


Income (Rs. 
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1500 — 2000 | 2000 ~ 2500 


153 


i) Compute two regression co-efficient 

ii) Co-efficient of correlation between income and expenditure. 

iii) Estimate the expenditure of person, when his income is Rs 4,000. 

iv) Which is more uniform, income distribution or expenditure distribution? 
v) Find the modal expenditure. 

vi) Determine the median value of frequency distribution of income. 


Answers 
1. High degree positive linear correlation. | 
2 i) 0.72 ii) 0.89 iii) 0.35 iv) 05 
3. 0.754 4 r=0 5, 1, perfect positive linear correlation 
6 0.70 7, 0,892, high degree positive correlation es 
8. ~0,92 9. ()0.89  (b) 0.64 10. 0.99 IL. a 
12. 0.093 13. r=0.52r is significant 14, -0.699 “a Aste 
16, rig Significant and limits are 0.8626 and 0.9436. 17. 0.7575 “A ee 
19. 0.959 ; IT and Ill ji, Land Ill one. teeta 
v5) mS 3, r= 0.96 and 92. 16% of variation xP aine 


r=0.25 and 25% of the data are explained. 


2 
4. y= 0.3042, 42.8745 and x = — 0.27784y * 3.4306 
= 0.6132 x + 14.812 and x = 1.36ly~ >- 134 when age = 45 


®- i) r= 0,89 iiy y= 83.758 +1. 1 iy 


| 
H 
i 


j) no. conclusion 


= 1.133x , = 0.60 
28. ¥ 1, r=": 
sae se 53.24 30. 19 years nearly vy) v=9 667x + 23 644 
RNs. a eee 
ii) bxy=0. er 
v) 67 a aa ge 
by = 0.80. by ia - 43, LXY = 520 
or & = 125 Sy = 100, Ex? = 650, 21" 6 
93, Ane dees) 6Y+2.77 


34. 


35. 


36. 
37. 


38. 
39, 


40. 
41. 


42. 


47. 


+ Y=~2.11321 +4.2138X, r= 0.9853, rig highly g 
+ by =0.607, byy 
» Y=-2.11321 +.4.2138X, r= 0.9853, , 
- @) byx= 0.208, by =— 0.865 


iii) Y= 0.8", x= 0.55 


: = 0.075, significant. aoe 
oo ii) oy=2 jit) yo 4x35 andx=gy* I> 
i) r=0.67 uN 7 

iv) x= 17.78 3.2 ce 58 and x= 0.2y — 8 


i) yp = 3.2 and by = 0.2 ie y= 
ii) Estimated sale = Rs. 106 Lakhs. 
is 2x + 3y — 7 = 0 because Dyx * by <1 


xony is 5x+ 4y-9=0 and y on x ; fl 
i) F=5,7=5 il) by =- 4 & by =-3 
iii) r=—0.29 

y =52andr=0.5 

i) xonyis 3x+2y=6 and y onx is 7x + Sy = 12 

ii) y=-11.6 ili) r=-0.96 iv) 7.84% 


r= 0.736, marks in science = 87.5 
i) by, =0.0405 and b,, = 0.1518 
li) y=0.0405x = 55.9314 and x = 0.15187 + 99.94 


iti) 108.47 Ibs 


iv) 60.6 inches. 


(a) bXY=0.545, (b) r= 0.6, (c) no conclusion 


() ¥=23.644 +0.667.X (©) ¥=67,00,000 


ignificant 
concluded, x 


is highly Significant, 


= 0.9, r= 0.739, nothi 
Ing can be = 15.34 + 0.607, Y: Rs. 2748000 


(b) r= 0.424, , Negative correlation 
ture) = Rs.1196.97 
Penditure) = 43.265% 


(c) Mode (income) = Rs.2696,97, Mode (expendi 
(d) CV. (Income) = 34.58%, CV (Ex a 


() by = 0.484 and b,, = 0.676 


(iii) Estimated ex i 
Penditure = Rs 21 
(v) M,=Rs 489 i 


(ii) r=0.572 


” xpenditure is more uniform than incom? 
(Vi) My=Rs 1045.45 
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multiple Choice Questions circle (O) the Correct answer 
1, What is the range of correlation coefficient? , 
b)  -00 to 00 
(a) Oto | (b) to ijcccbine 
», Ifr= 0.3 then coefficient of determination implies that (d) Otol 
(a) 30% of total variation in dependent variable has been explained by j 
(b) 60% of total variation in dependent variable has been e y in dependent variable. 


Orne Xplained by i 
() 3% of total variation in dependent variable has been ex y in dependent variable. 


pokin plained by in dependent variable 
(@) 4% of total variation in dependent variable has been explained by in dependent variable 


3, The regression line of X on Y and Y on Y are intersect at the point. 


(a) (H, 9) (b) (4,5) (©) (YY (@) (%,% 
4, The term regression was introduced by: 
(a) R.A. Fisher (b) Sir Francis Galton (c) Karl Pearson (d) None of above 


s, If X and Y are two variates, there can be at most: 
(a) one regression line (b) two regression lines 
(c) three regression lines (d) an infinite number of regression lines 
6. Ina regression line of Y on X, the variable X is known as: 
(a) independent variable (b) regressor 
(c) explanatory variable (d) All the above 
7. Regression equation is also named as: 
(a) prediction equation (b) estimating equation 
(c)_ line of average relationship (d) all the above 
8. Scatter diagram of the variate values (X, ¥) gives poe pecs a 
(a) functional relationship ce adios 
9. The estimate of f in the regression equation Y= 0 + eee (a) efficient 
(a) biased (b) unbiased | 
10. The formula for the estimation of B in the regression equan® 


Sr 
(b) "oy 


(d) All the above 


(c) consistent 
n Y=at+PX+eEls: 


(a) cov (xX, Y)/ V(X) 
© 4X N/UK-* 


" the: . 
ll. Inthe regression line Y= 0 + B.X, Bis called ‘ui intercept of the line 
(a) slope of the line (a) rank 
(c) constant ae , 
ie 
12. In the regression line Y= Bo + Bi X, Bos (b) intercept of the 1m 
(a) slope of the line (d) coefficient 


(c) rank 


. If Byy> 1, then Byy is: 


. If Byy< 1, then Byy is: 


they have: 
(b) opposite SB» 
g can be said 


i jents 
are two regression coefficients, 


If By and Buy 
(a) same sign 
(c) either same fo) 


(d) nothin 


r opposite signs fad: 


have same signs, It cal 


_ The property that Brx and Bro-and P (b) signature prop cd 
i ere eee ee equal to the co ; 
asain Te regression coefficients is always greater than oF _ relation 
The average 0 
ooetinmcatls nee a (b) signature property 
fundamental prope 
(a) fun (d) mean property 


(c) magnitude property 


(d) equal to 0 


(a) less than | (b) greater than | (c) equal 1 


(a) less than 1 (b) greater than | (c) equal to | (d) equal to 0 


. Ifp =, the two lines of regression are 


(a) coincident (b) parallel (c) perpendicular to each other 


(d) none of the above 


. Ifp=1, he angle between the two line of regression is: 


(a) zero degree (b) ninety degree (c) sixty degree (d) thirty degree 
If p = 0, the lines of regression are: 
(a) coincident (b) parallel (c) perpendicular to each other 


(d) none of the above 


Answers key 


aK 


: pack of cards etc. But nowadays, 
as not been used. The theory of probability is 
1 science, biological science, medical science, 


2 eis hardly any discipline (field) where “probability” 


Business firms, factory managers, stock market policy makers often have been facing the problems 
regarding the chance of selling the goods, chance of receiving better demand, chance of getting defective 
or non-defective product and chance of deterioration and prosperity of stock market etc. The word 
probability is a chance or possibility which is widely used in daily life also. 


e.g. What is the chance of heavy rain in this morning? 
What is the chance of passing BCA student in a final examination? 


Probability is a numerical measure (with a value lying between 0 and 1) of the likelihood of chance 
that a particular event will occur or not. Probability is simply a number lies between zero and one. That 
is 0<P <1. It is generally denoted by 'P’. If there is absolute possibility of occurrence of an event, 
then its probability is equal to one. This probability is also called the probability of certainty. If there is 
complete impossibility that an event will occur, then its probability is zero and is called the probability of 
uncertainty. For example, probability of rising the sun form east is | and probability of rising the sun 
from west is 0. This shows that the range of probability is in between zero and one. This range of 


probability is shown as follows. 


0 1/2 


Probability of Probability of 
event as likely certainty 
to occur or not 


Probability of 
uncertainty 


4.2 Basic Terminologies in Probability different possible outcomes is 
1, Experiment: The process (phenomena) performed aaa e€ 

known as experiment. For example, tossing 4 coin, salen outcomes are 

2, Random Experiment: An experiment in which all the sau In other words, 

No personal bias are expected is called ‘Random sae of times under essen 
a random experiment if it is performed a large ein 


: ible outcome 
he various possib 
Conditions; th It is not unique but may be any one oft f rando 
; the result 1s form die are some examples © 


Unbiased (fair) coin and rolling a un! 


known in advance and 
an experiment is called 
tially homogeneous 
s. Tossing an 


m experiment. 


utcome OF combination He 


. d fe) ; 
cial ple; flipping (tossing) 


ed 4 
nt. For exam an 


‘vent iS called sample space, Each 
xP usually denoted by S. For example 
ie head (H) and tail (7). So, the samp), 


. se 
2, Sample Space: 1ne dered as 2 sam : re ae possib 20 


is thus i 
outcome random experiment or the 
eriment. In a toss of singl. 


5 2. Similarly, in rolling a gi, 


in tossing an unbiased coin once, 
ati ae =f total numbe 
3. Exhaustive Cases: se a 


a 

i tcomes of 
ossible ou 

file stive cases for the eXP' 


ve number of cases 1 


coin, We Cae cases = 0. iment which result in 
faced die, the exhaustive mum ber of outcomes of a random exper the 


: favourable to the ey 
Favourable Cases or Events: The num cur are termed as the ae a 
4. Favpening of an event OF which are desired to -. favour of happening of an event are called 
- hic ‘ 
utcomes W cards, the numb 
in other words, the numbers of cA yg a card from a pack of 52 PU} Oe ba ticce 
favourable cases. For example; in in a toss of two comns, the number of cases 


cases favourable to drawing a queen is 4, Similarly j a ec ca oew ae 
favourable to the event ‘exactly one head ‘ is 2, viz., HT, 447 4 


i i bable if none of 
i 3 to be equally likely or equally pro 
5, Equally Likely Events: Events or cases are said ‘a Sy oe wee ne eat 


them is expected to occur in preference to other. In ot : 
ull likely if all of them have equal chance of occurrence. In tossing an unbiased coin, head and tail 
re equally likely. 


are equally likely, throwing a unbiased die (the faces 1,2,3,4,5, 6)a _— 
6. Mutually Exclusive Events: Two or more events are said to be mutually exclusive if the happening 
of any one of them excludes the happening of all others in the same experiments i.e. if two or more 


than two events cannot occur simultaneously at the same time in the same trial. For example ina 
single tossing of a coin, we may get either head or tail but not both. Thus, the events head and tail in 
a single tossing of a coin are mutually exclusive. Similarly, in the throw of a die, the six faces 
numbered 1, 2, 3, 4, 5 & 6 are mutually exclusive. Thus, no two or more of them can happen 
simultaneously. 
7 a aes sain anos said to be independent if the occurrence of one event does not 
¢ of the other events and vice v i . 
in first tossing is independent of the occurrence of seed rm bec hcry hia ua rage ee 
: ad in the second tossing. Similarly, drawing o! 


43 Fundamental Princi 


It is also known as 
Possible outcomes in an e 


ple of Counting 


basic princi 
: ple of counti 
Xperiment. ng. The counting rules facilitate to calculate all thé 


as Tf one > 
‘ 10) 
Performed j Performed in N2 di Peration can be performed in n differe™ 
N event A in nx Ny Ways;' I 
etic 1 CN Occur in — ; 
Occur iN a to YS and after it 


tal of y 
tcan be Beneralized to mo 
re 


That is, a 
then both the e 


The resy] 
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‘ft separate parts of ; 
if there are mc RED rn an experiment, and the first part can be done in n ways, second 
1 , 


assive part in m2 ways ... and k® 
suce 


Successive part in n 
wa 
ws is giVel by my X ny Xo X My * Ways, then the total number of possible 


gutco ; ‘ F 
For example, tossing an unbiased coin thrice. There are two 


ae ; possible outcomes i 
iy and tail '7'. Hence, the total number of possible outcomes S in each toss namely 


=2x2x2=27=8. 
HHH, HHT, HTH, HTT, TTT, THH, THT, TTH 


42d Permutation and Combination 


1. permutation: a literal meaning of the word permutation is “Arrangement”. Therefore 
ermutation is the arrangement of objects taken some or all at time in some order. If there are mt 


objects and they are to be placed th any definite arrangement or order. The number of permutations 
of'n' different objects taken 'r' objectives at a time is denoted by "p, or P (n, r) and defined as 


n 
P= GP! where, r<n,n!=n x (n—1)x(n—2)! 
= n(n—1)(n-2)+»3* 2X1 


How many numbers of 4 different digits can be formed with digits 1, 2, 3, 4,5? 
Solution: Here n = 5 , r= 4. 
Then, the required permutation is 


5! 5! 


n! 
PQn,)= Gop! = (5-4)! ~ 0! = 5x4x3x2x1=120 


Hence, 120 different numbers of 5 digits can be formed. 


42.2 Permutation of Objects not at all Different 


The number of permutations of ‘7! objects taken all at a time, when 'p' objects are of one kind 'q’, 
objects are of second kind, ‘7’ objects are of third kind, 's' objects are of fourth kind and so on is 


n'\ 
pigiris! -- 
Note that: 0! =1 and 1! =1 


Find the total number of arrangements of the letters of the word 'STATISTICS' taken all at 
a time. 


10! 10.9.8.7.6.5.4 _ cranoo 
Solution: Here, the required arrangement is given by = 37312! 1! 1! ae ke - 


ation is * ion” bination 
Co ions ation is “Selection”. Therefore, combia™ 
mbination: The literal meaning of the word combin a afie order. Note that Fecpatiantae 


Is the selection of objects taken some or all at a time without sp 


€ object is meaningless in combination. : 
: ; i "Cor C(n, r) OF (") 
A combination of 'n' different objects taken '7" objects at a time, Is denoted by "C; 


and define d as 


n! é 
"C = 7 for ron 
ro (n- rir 


 acthsl —e — 
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160 A Textbook of Probability tee of 4 persons be chosen out of 8 persons? 
mmi 
ways can a co 


[Example 4.3] In how many | 
4, Required number of ways 


x3x3x2Xx1 


Solution: wines * gx7xox5x4- EES = 70 ways 
P ols 2 moe = x3 x2% I) (4*- 
Cy = > 4yi4! 4!4! ( 
(8 - 4)!4° 
bilit 
4,3 Approaches of Proba 7 f any event in a random experiment js Cally 


ac ance O 
of occurrence or non-occurrence 


The chance o define the probability. 


probability. There are four approaches t has 7 
1. Mathematical or classical or prior approach 


Statistical or empirical or relative frequency approach 


5 


~- 


3. Subjective approach 
4. Axiomatic approach 


4.3.1 Mathematical or Classical or Priori Approach of Probability 


Let n be the exhaustive, mutually exclusive and equally likely cases (or outcomes) out of which 'y) 
are favourable cases to the happening of an event A. Then the probability of happening (occurrence) of ay 


event A is given by 


PIA) = Favourable number of casesto A _m (1) 
(4) =" Exhaustive number of cases. 7 * 


If (n — m) cases are favourable to the non-occurrence of an event A, then its probability is given by 


P(A‘) or P(A). 


vo Abeeq Favourable number of cases for not happening of event A ais m Ln = m 


Exhaustive number of cases 


= m 
> Race = 17 Slap) 
Hence, P(A)+P(A)=1 => p+q=l, q=1-p 
This shows that total probability is equal to 1. 
Hence, P(A) + P(A) = 1 i.e., 100% 


Note that probability lies between zero and lie.O<p<] 
Limitations of Classical Approach of Probability 


This approach breaks down in the following cases 
a. If 

b. If the various outcomes of the 
c. If the actual value of n is not known 


ive frequency of the event 4. In 4 


imiti the ratio ™ gi 
ne limiting case n gives 


e relat : 
w 
€n n becomes sufficient! 


‘< called the probabili ; 
ymber is C8 e Pp ty of A and symbolically denoted by y large, then the 
Pca) = im m 
N—-voo n 


Limitations of statistical or empirical approach of probabilj 
a, The experimental conditions may not nauk eke aes 
ntia 


number of repetitions of the experiment. y homogeneous and identical in a large 


; m 
b. The relative frequency y May not attain a uni 
que value, no matter however large n may be. 


43.3 Subjective Approach 
The subjective probability approach is purely individuali 


probability is completely based on the personal beliefs 


discretion of a person. Since, different persons may assign diffe iliti 

cret : : shee rent probabilit 
objective conclusions using probabilities assigned by this subjective metal This Belo ene 
probability is generally used by top level authorities on the basis of their discretion i 


Note: When probability is not given directly, then it can be categorized into two cases: 
Case I: When one item is selected at a time. 
Case II: When more than one item is selected at a time. 


A card is drawn at random from a well-shuffled pack of 52 cards. What is the probability 
of getting (a) ared card, (b) a black king? 


Solution: There are 52 cards in a pack. Total number of cases (7) = 52 
(a) Favourable number of cases (m) = 26 (Since there are 26 red cards) 
Required probability of getting a red card is 


stic in nature. Therefore, this approach of 
feelings, experience, judgment, personal 


P(ared card) = = = 9 2 


(2) Favourable number of cases (m) =2 (There are 2 black kings) 
Required probability of getting a black king is 


2 1 
P(a black king) = 52 = 26 


Example 45] i bability that it 1s 
Sam i k of 52 cards at random what Is the pro is 
a een ee ed king, (vi) Knave of heart, (vil) King or 


i il iii ard, (iv) an ace, (Vv) F sos 
eee ee a : 2 or black 8 or a queen, (x) Spade or ace, (xi) 


face card, (xii) Red or face card. 


Colour (2) 


Red (26) 
Black (26) 


(3) ¢ Diamond (13) 


” J? ten (13) v Heart 
3, 4, 5,6, 7,8. % 10, J, Q,K 


Ace, 2, 


isti ‘A 
162 A Textbook of Probability and Statistics for BC. 


ce there are 52 cards in a pack, 


Solution: Sin es (or outcomes) = 52 


n= Total number of cas 


Since one card is drawn at random 


'E" denotes the event of drawing red cards 


i) Let : 
Favourable number of case (m) = 26 
m —— — 
72 


P(a red) = P(E) =, =52~ 2 
(ii) Let E denotes the event of drawing spade cards 
Favourable number of cases () = 13 - 
m —_—— 
Pla spade) = PE) = = 54 = 4 
(iii) Let 'E’ denotes the event of drawing a face card . . 
Favourable number of cases (m)=12 (3 face cards in each suit) 
m 12_ 3 

P (a face card) = P(E) =7 = 59 = 13 

(iv) Let E denotes the event of drawing an ace 

Favourable number of outcomes (m) = 4 


4 1 
P(an ace) =? = 32-13 
(v) Favourable number of cases (m) = number of red kings = 2 
’ 2.4 
P(ared king) = 35 = 36 


(vi) Favourable number of cases (m) = number of knave of heart = 1 


P(a knave of heart) = s 


(vii) Favourable number of cases (m) = number of kin eect quceh=A gale 


P(king or queen) = 4 = 4 
(viii) m=13+13=26 
26 1 
P(Heart or club) = 5272 
(ix) m=number of red 2 or black 8 ora queen =2+2+4=8 
P(a red 2 or black 8 ora queen) = = = 2 
13 


(x) m=number of spade or ace = 13 +4-1=16 


P(a spade or ace) = is B. 3 


xi) Fav 
(xi) ee sae of cases (m) Drawing a heart or face card = 13 + 12 —3 
= 22 («3 face cards of heart are common) 


P(a heart or face Card) = = 1 


= Drawing a red or face card 


“ye 
C7 


P, . 
Twenty balls are numbered from | to 20. If on se 


(Ss =robability that the ball drawn is multiple of 4 or 7? e ball is drawn at random, what is the 


jution: Total number of cases () = 20 
sue 
Favourable number of cases (m) = The number 0 Pie 


ie. {4, 7, 8, 12, 14, 16, and 20} 
m=534+2=7 


te multiple of 4 or 7 


So, 


2. 
20 


Required probability that the ball drawn is multiple of 4or7= 2 
n 


Gxample 4.7] Two fair dice are thrown at random. What is the 
sum 7 (ii) a sum of 8 or 9 (iii) sum less than 5 
second die (vi) same faces (vii) different faces. 

solution: Since two dice are thrown 


probability that the face turn up show (i) a 
(iv) number 6 in the first die (v) odd number in the 


n= Total number of cases = 6 x 6 = 36 


The sample spaces (total outcomes) are presented below: 


i Faces in second die 


(5,4) ©, 5) 
(6,4) (6,5) (6,6) 


Faces in first die 


Nun PWN = 


())  m =Favourable number of outcomes = getting a sum 7 in both dice = 6 
ie., (1, 6) (2, 5) (3, 4) (4, 3) (5, 2) (6, 1) 


mB. ok 
P(a sum N= "36 6 


i 8or9 =9 
(ii) m = Favourable number of outcomes = number of cases getting a sum of 


ie, (2, 6) (35 5) (4,4) (5, 3) (6, 2) 3, 6) 4, 5) 6 4) (6.9) 


9 61 
P(asum is 8 or 9)= 36-4 
of cases getting a sum of less than 5 = 6 


(iii) 1m = Favourable number of outcomes = Number 


ies, (1s 1) (1p 2 2 Ho 1s 3s 22) BD 


6 _1 
P(sum less than 5) = 36-6 


number of cases getting a 6 in Ist die = 6 


(iv) mm = Favourable number of outcomes ~ cer 
i.e. (6,1), 6,2 6,3) 64) 6 
6_1 
P(6 in first die) = 36 ~ 6 


“grt 


ability and Statistics for BCA 


ber of cases of odd number in the second die ~ i 


4,3).(2.3) 
(4, 5) (5, 5), (6, 5) 
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f outcomes = num 


(4, 1), (5, 1), (6 Ds 
(6,3), (1, 5), (2 5)» (3, 5), 


m= Favourable no. 0 
ie., (1,1), (2, 1); (3, 1), 
(3, 3), (4, 3), (5, 3), 


(v) 


ene 
ond die) = 36 = 2 


m = Favourable no. of outcomes = 6 


(vi) 
ie. (102,2B,944) 6 5) (6, 6) 
| ; 6 1 
P(same faces in both dice) = 36 =6 


P(Odd number in the sec 


m = Favourable number of outcomes = No. of cases of different faces in both dice = 39 
ie, (1,2) (2, 1) 1, 3) 3, 1, CL ) ©, 1) 
m 30 5 
n 


(vii) 


P(different faces) =" = 36 = 6 


Since, P (different faces) + P (same faces) = | 
6 
P (different faces) = 1 — P(same faces) = 1 — 36 = 


What is the chance that a leap year selected randomly consists of 53 Sundays? 


Solution: Ina leap year, there are 366 days i.e. 52 complete weeks and 2 days over. These 2 days may be 
either (i) Sunday and Monday or (ii) Monday and Tuesday or (iii) Tuesday and Wednesday or (iy 
Wednesday and Thursday or (v) Thursday and Friday or (vi) Friday and Saturday or (vii) Saturday 


and Sunday. 
Total number of cases (1) = 7, P(53 Sundays) = ? 
Favourable number of cases (m) = Number of cases consisting Sunday = 2 
Required probability that a leap year selected randomly consists of 53 Sundays i.e. 


2 
PAs ae 


| “ 
What is the chance that a non-leap year selected randomly consists of 53 Sundavs? 
ys? 


Solution: In a non leap year, there are 365 days j 
; : ee ays 1.e. 52 complete w 
either (i) Sunday or (ii) Monday or (iii) Tuesday or ie) Wed fae hia aro ae 
(vii) Saturday. nesday or (v) Thursday or (vi) Friday 0! 


Total number of cases (n)=7, P(53 Sundays) =? 
Favo = = 
urable number of cases (m) = Number of cases consisting Sunday = | 
y 


Reau; a 
quired probability that a non-leap year selected randomly 
cons 


p(4y-2.! 


ists of 53 Sundays i.e. 


n~7 


What is the Probability that a 
10,000 (b) less than Rs. 15,0002 


Media sales Person m 


solution: SINC®> total number of sales persons is 300 Probability 168 
Total number of cases (7) = 300 


yourable no. of cases (m) = No. of medi 
,) Fa dia sales persons make a commission = 25 
n= 


_m_25 1 
ber of c = = 300710 
(b) Favourable num ases (m) = number of media sales persons mak ee 
€ a commission = 75 


Pla media sales person makes a commission between R 
S. 5,000 to 10,000) 


ia sales person makes — 
pla med P a commission less than Rs. 15000) = =22+25* 15 


=-- 


== 
n 300.24 


fgxample 4.11| A bag contains 8 red, 4 white and 5 black coloured balls. Three balls are d d 
. rawn randomly 


from a bag. Find the probability that (i) all ar Bh aes 
(iv) all colour balls. @ red (ii) 2 is red and 1 white (iii) 2 are red and 1 other 


solution: Exhaustive (total) number of cases (n) = number of cases of selection of 3 balls out of 17 balls 


—l1c, Pe 16x 15 
ke 1x2x3 = 680. 


(i) Favourable number of cases of drawing all 3 are red balls (m) = °C; = ~ : ~ : = 56 
m_ 56 
P(all are red) => = g0 = 0.082 
(ii) Favourable cases for 2 red and 1 white =m =°C, x *C, ft x ¢ =112 
jn Ee 
P(2 is red and 1 white) =| = 630 = 0.1647 
(iii) Favourable cases for 2 are red out of three drawn balls i.e. 2 are red and | other 
8x7 9 
(m) = °C, x °C, =x 17 
252 
P (2 are red and 1 other) = - = 6807 0.37058 


(iv) Favourable number of cases for all coloured balls (m) = Or: 4c. x °C, =8%4% 5 = 160 


160 4 
Hence, P(all colour balls) = me 6807 = 0.235 


andom, what is the 


Example 4.12] Five men in a group of 20 are graduates. If 3 are chosen out of 20 at r 


probability that 
a) all are graduates 
c) at least one of them being graduate 


Solnes 
“lution: Here, Total number of men = 20, Numb 


b) none of them is graduate 


er of graduates = 3 


Number of non-graduates = 20-5=15 
If three are choosen at random se 
Total number of possible outcomes (1) = 3 


«19 x 18X17! _ 4140 


0! 20! _ 20 3 
= (0-33! 17! 3! 17! 3! 
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if oe 
Favourable number of cases (m)=°C3 (5-3)! eT 
fe, 10. nod 
P(all are graduate) = WC =7140 Ti4 . 


a ee. ee 
r none of them are graduate (m) = °C3= (15 —3)!3! = 455 


(a) 


(b) Favourable number of cases fo 
Bo, 455 91 
P(none of them are graduate) = 7c, = 1140 ~ 228 


ee 
t = ] —SASD =] 
(c) P(at least one of them being graduate) = 1 — P(none them are graduate) 238 = aR 


A class consists of 40 boys and 60 girls. 


be the probability that 
(a) both are boys (b) both are girls 
Solution: Number of boys ina class = 40, Number of girls in a class = 60. 
Total number of student = 40 + 60 = 100 
If two students are chosen at random, total number of possible outcomes (”) = 10, = 4950 


If two students are chosen at random, what vil 


(c) one boy and one girl 


(i) Favourable case for both are boys (m) = “°C, = 780 
40 
C, _780 26 
P(both are boys) = THC = 4950 165 
(ii) Favourable case for both are girls (m) = C, = 1770 
60 
, C, 1770 177 59 
P(both are girls) = 100, = 4959 = 495 = 165 
(iii) Favourable number of cases for one boy and one girl (m) = “°C, x ©C, = 40 x 60 = 2400 
"Cx "°C, 2400 48 16 


P(one b irl) = = set oe 
(one boy and one girl) oe 4950 = 99 = 33 


4.4 Laws of Probability 


There are two important theorems on probabilitiy. The lity an! 
following are the laws of probability. : ae er re 


1. Additive law of probability 2. Multiplicative law of probability 
4.4.1 Additive Law of Probability (Law of total probability) 
Case I: When the events are not mutually exclusive 


If A and B are not mutuall i : 
YY exclus ili 2 
occurrence of at least one of them is Sue i Aspe 
P(A or B) = P(A UB)=P(A 
7 (4) + P(B)- P(4 NB) 


1. Ifevents are mutually exclusive 


P(A U B)= P(A) + PCB) because P(A B)=0 
2. Also,wehave P(A U B)+P(AUB)=1 ) 


PAU B)=1- PAU B) 


By De-Morgan's Law 
P(A U B)= P(A 1 B) and P(A U B)= PUNB 
| 4. If A, B and C are three not mutually exclusive events the : 
ig given by Sen the occurrence of at least one of th 
| em 


piAorBorC) = PAUBUG 


167 


i 


i 


P(A) + P(B) + PCC) ~ PU 
1 B) - 
5, Also, we have )~ PBN C)~P(CN A) + PANBNG 


PAUBUC)=1-PAUBUC) 
Also, PAUBUC)+P.AUBUC)=| 


| 6. De-Morgan's law 

| MAU BUC)=P4 NBN C) and KANBNC)=ATU BUG 
1. PAUBUO)=1-PAN ENG 

8. In general, 


Pt U 42 U 43 U Ay UU 4,) = 1 - PUA UU UA, =1- PA AN Ay) 


Example 4.14 | The probability that a new airport will get an award for its design is 0.16, the probability 


that it will get an award for the efficient use of materials is 0.24, and the probability that it will get 
both awards is 0.11. 


| a. What is the probability that it will get at least one of the two awards? 
b. What is the probability that it will get only one of two awards? 

Solution: Probability that a new airport will get award for its design, P(A) = 0.16 
Probability that a new airport will get award for efficient, P(B) = 0.24 
Probability that a new airport will get award for both, P(ANB) = 0.11 

a. Probability that it will get at least one of the two awards, 
P(A U B) = P(A) + P(B)- P(A 2 B) 
= 0.16 + 0.24 —0.11 =0.29 
b. Probability that it will get only one of two awards. 
= [P(A) - P(A N.B)] or [P(B) - P(A 1) 
= [P(A) - P(A 1 B)] + [PB)- PA 918) 
= (0.16 — 0.11) + (0.24 — 0.11) = 0.05 + 0.13 =0.18 


ill get contract B is 5 
Company will get contract A is 2 the probability that the company will 8 


ts is What is the probability that ee 
Probability that the company will get both the contrac 3 


“ompany will get contract A or B? 


Sout _1 pA or B)=? 
Mion: We have, P(A) ==, P(B)=3 PUAN B= 97 


P(ANB)=5+* 


| 


| 


Wrf uo 
Wp 
\ 
Col 
M 
R 


P(A or B) = P(A U B)= P(A) + P(B)- 


olarship is 0.9 and that a girl will get is 0.¢. Wh 
at 


a boy will get 4 sch 


‘ity that sai) 
The probabilltY + one of them will get the scholarship? eae 
ty that at leas = cenit G=event of a girl getting scholarship 
a boy getting ? 


G) = 0.2 
P(B) = 0.1 and P(G) = 9.8, PC) | 
f them will get scholarship is given by 


p(B nN G) (Using De-Morgan's law) 
da girl getting scholarship are independeny 


Then,  P(B)= 0.9, 
The probability that at least one 0 
ppuUG) =1-PBUS) =1- 
~ =1-P(B)* P(G) (Since, 4 boy an 
= 1-01 * 02 = 0.98 
rnative method (i) 
ee that at least one of them will get scholarship is given by P(B UG)= P (Boy get ih 
scholarship and girl not or girl get the scholarship and boy not or both get the scholarships) 
=P(BNGorGNB or BNG) 
= P(BN G)+P(GN B)+P(BN G) 
= P(B) x P(G) + P(G) x P(B) + P(B) x P(G) 
= 0.9 0240.8 x 0.14+0.9 x 0.8 =0.98 
Alternative method (ii) 
The probability that at least one of them will get scholarship is given by 
P(B U G)= P(B) + P(G)- P(BN G) 
= P(B) + P(G)- P(B) x P(G) 
= 0.9 + 0.8 - 0.9 x 0.8 = 0.98 


A bag contains 24 balls numbered from 1 to 24. One ball is drawn at random. Find ttt 
probability that the ball drawn has a number which is multiple of 3 or 4 ; 


Solution: L 
on: Let A and B be the events of drawing a number which is multiple of 3 and 4 respectively 
Then, A = (3, 6,9, 12, 15, 18, 21, 24}, B= 14,812, 16.0 a Pp 
8 iT 


m 


P(3 or 4) = 
~P(AUB)= P(A) + P(B)— Pg NB) 


=48 2 01 
24* 24-947 5 


But 12 and 24 repeated, So, P(A 1) B)= Rn 2 
n 


i] 
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. when the events are mutually exclusive 


case i. and B be two mutually exclusive events. Then 
: e occurrence of either event A or event B is th 
"dividual probabilities. Hence, the probability of 


| 
| either event A or event B is given by 
| 


lat 
the probability 
€ sum of their A B 
Occurrence of 


_B and C are three mutually exclusive events, then the © 
probability of occurrence of either events A or B or Cis given by 


t) P(A or Bor C)= P(A UBUC)= P(A) + P(B) + P(C) 
| jn general, if events A), A>, ---, A, aren mutually exclusive then 
P(A, U A2U + U An) = P(A,) + P(A2) + P(A3) + + P(A) 


he “4: 
; an 
| Gam Je 4.18] The probability that a company execute will travel by plane is 3 and that he will travel by 
| L ts 3 
| train is 5 Find the probability of travelling by plane or train. 
solution: Let A and B be the events that a company execute will travel by plane and train respectively. 
| i) 1 
| P(A) =3, P(B)=%, P(Plane or train) = P (A or B) =? 
| P(AUB) =P(A)+P(B) =54+5 =40 
Example 4.19 | If an experiment has the three possible and mutually exclusive outcomes 4, B and C, 
check in each case whether the assignment of probabilities is permissible. 
| (a) P(A) = 1/3, P(B) = 1/3, P(C) = 1/3 
| (b) P(A) = 0.35, P(B) = 0.52, P(C) = 0.26 
| ‘ 111 
ee | Solution: (a) P(A) + P(B) + P(C)= 3435+ 3= 1 


(b) P(A) + P(B) + P(C) = 0.35 + 0.52 + 0.26 = 1.13 > 1 which is impossible. 


xample 4.20] Three events A, B and C are mutually exclusive events and their respective probabilities 
are as follows. 


| P(A) = 2/3; P(B)= 1/4; P(C)= 1/6. Comment on the result. 


Solution: If 4, B and C are mutually exclusive events then 


2 { 1 2B 
P(A or Bor C) = P(AUBUC)=P(A) + PB) + PO) = 3 + 4 +6 


P(A or B or C) = + = 1.08 > 1, which is not possible. 


Hence, the given information is not correct. bjective 
: . ing subjectiv 
“ample 4.21 Suppose that a manager of a large apartment sup anaatie i ill 
: ¥ 3 ontn. 
Probability estimates about the number of vacancies that will exist next m 


obability of the event. 


List . ‘ and provide the pr 
the sample points in each of the following events s p ve, or fewer vacancies 


to As respectively. 


5 is denoted by Ao 


e the number 0 ‘ P(Ao) = 0.05 


. vacan 
Jecting n° . 
f se st four vacanci 


Solution: Suppos 


es, 


| ility 0 
| (a) Probabl annul “5 
ili 0.10 =": 
| ey Re Py areasenen= we 
( selecting two or fewer vacancies, 5 40.15 + 0.05 = 0.55 


(c) Probability of s 


P(A,UA\U Ad) = ) + P(Ai) + Pld) = 0.3 
2 1 


nee ards. Find the probability of drawing (i)a 


ck of 52 ¢ : 
at random from a pack © nor a king. 
Example 422)" vena card which is neither a jack , a queen 
1 


| Let Vi Ww jack, a ueen and a kin respectively. Th nN 
A B&Cbe thee ents of dra ing a Ja’ k, Gg £ € 
Solution: ‘ 


4 4 oe 
P(A) = 5: PCB) = 5p» PO) = 53 


Probability of drawing a jack, a queen or a king 
= P(a jack, a queen or a king) = P(A or B or C) 


[Soe 2 


4 4 
= P(AUBUC)=P(4) + PB) + P(C)= 55 +59 + 52 = 3 


ez 
P(Neither a jack, a queen nor a king) = 1 — P (a jack, a queen or a king) = 1-73 =7 
4.4.2 Multiplicative Law of Probability 
| Case I: For independent events 
Let A and B are two independent events, then the probability of occurrence of both the events is tht 
product of their individual probabilities, 1.€., 
P(A and B) = P(A N B) = P(A) P(B) 
If A, B and C are three independent events, then 
P(AN BN C) = P(A)-P(B).P(C) 
In general, if Aj, Ao, -- 
Then 


TRH 


- A, are independent events. 


PAIN ANA A,) = P(A 
P(ANB) = Joint Probability of events A and B 
- io 
(ANBNC) = Joint Probability of events 4, B 
P(A\N AN on 1 A.) = Joint : 


Probabili 
Py i C P(A), P(B), P(C), P(A,) P(A ete cn 
>, C, Aj, A etc. Tespectively, 


1) - P(A) ... P(A,) 


where, 


2) etc. are the i 
marginal Probabilities of occurrence of everl® 


Mr. Y a : 
Ppear in an inter; «sat 
Persons. The SEVIS for getting the scholarship: ue 


and getting b 3 Probabili is 
oy MON A 5: What is the Probabili oe Setting scholarship by Mr. o 
Ity that, 


b) Ont 
Y One of them wit} get scholarship, 


SS — 
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P(A)= Probability that Mr. X will get scholarship 
p(A) = Probability that Mr. X will not get scholarship 
P(B) = Probability that Mr. Y will get scholarship 
P(A) = Probability that Mr. X will not get scholarship 


cation: Lt 


Given that, 

| ad Hejtomet - 
P(A)=z and P(A)=1-F=5; P(B)=< and P(B)=1 - 
a (a) probability that both of them will get scholarship, 


2 _1lol I 
P(A 1 B)= P(A) P(B) = =x 5 = 35 [+ A and B are independent] 


(b) Probability that only one of them will get scholarship 
| = P(ANB or ANB) = P(ANB) + P(ANB) 
| = P(A) P(B) + P(A) P(B) 
14. 621. 10 


= 7 x 5 + 7 x 3 = 35 
(c) Probability that non of them will get scholarship 
P(A N B)=P(A) P(B) = 2x4 = 3 


5 > 35 


pa 
55 


Probability that a man will be alive 25 years hence is 0.3 and the probability that his wife 
will be alive 25 years hence is 0.4. Find the probability that 25 years hence (i) both will be alive (i) 
only the man will be alive (iii) only the woman will be alive(iv) none will be alive (v) at least one of 

| them will be alive. 


| Solution: Probability that a man will be alive 25 years, hence i.e. 
| P(A) =03 and then, P(A) = 1-0.3=0.7 
Similarly, probability that his wife will be alive 25 years, hence, i.e. 
P(B)=0.4, P(B)=1-0.4=0.6 
(i) Required probability that both will be alive 25 years 
P(A NB) = P(A) x P(B) = 0.3 * 0.4=0.12 
(i) Probability that only the man will be alive 
= P(A N B)=P(A) x P(B) = 0.3 x 0.6 = 0.18 
(iti Probability that the only woman will be alive 
= P(BN A) =P(B) * (PA) = 0.4 x 0.7= 0.28 
The Probability that non of them will alive = P(A U B)= P(A) - P(B) = 9.7 * 0.6 = 0.42 
1 Required probability that at least one of them will be alive 
= P(ANB or AN Bor AN B) 
= [P(A) x P(B)] + [P(A) * PI + 
=0.3 x 0.6 +0.7 x 0.4+0.3 * 0.4= 
Atemativety (iv) can be calculated as follows. 


P(A UB)=1— P(A) P(B)= 1 - 0.7 * 0.6 = 0.98 


he 


nts 


[P(A) x P(B)] 
0.18 +.0.28 + 0.12 = 0.58 


and C whose chances of solving 


m can solve the problem, 


Ale , istic 
pm its lity that 
1 “cely, Find the probabilty f the 
TI respective y: (b) Only one oO 
t B and C cannot. 


and = 


are 3-4 ; 
(a) The problem ill be solve? blem (d 
fone of them will solve the pro ‘sig 
(c) None Can solve the prob el 
(e) All three students A, Band cans 
fea: Give ; 2 
Solution: GIVEN, Scone 
=+and P(A)='~ 37 3 
Probability that 4 solves a problem 1.€. P(A)=34 
: 1 R\=1-7-=4 
Probability that B solves a problem i.e. p(B) = and P(B) | 
. = 1 4 
i 1 =) DEH e 
Probability that C solves a problem 1.€. PCH s and P(C) : 475 
Required probability that the problem will be solved is given by 
PAU BUC) =P(A)+P(B)+P(C)—P ANB) — P (BNO - P(CNA) + P (ANBNC) 
= P(A) + P(B) + P(C)— P(A) P(B)- PB) P(C)—P(C): P(A) + P(A) PB) PQ) 
feet dt at ede 
a dee bare tae Wale ca ae a 


=—6f7'5 o° 4° 4° 5°75 a 8 
[Since, A, B and C events A and are independent] 


Alternatively: Required probability that the problem will be solved 


| 
=1- P(A). PB) P(C) =1-3 «3x3 =3 


(b) Probability that only one of them can solve the problem. 
=P(AN Bn C)+PAN BN C)+P(ANCNO 
ae oe P(A) - P(B) - P(C) + P(A) - P(B) -P(C) 
5°45 45 GX5+9 Gg = Sa0433 


(c) Probability that non of them will solve the probl 
oblem, 


PANBNQ=P(A). 1p 
- “ BN C)=P(4).P(B). PC) =2 3,4 2 
robability that 4 solves it but B and C ae ae 
- 7 Cannot 
PANBN®= B 
O=P(A)x PB) x PCat 3 4 | 
3° 4" 5=5=0.2 


(€) Probabili 
ability that all three Students can l 
Solve the prob| 
em, 


PANBNGC)=p 
WSIND) 8 Gye Lcl | 


| °. i 


ball is draw 
N at ra and 4 whi 
and one is white: hdom from each b c 
e; (iii) Same ag, find th box co i 
Colors © probabil ntains 3 red and 5 white balls. Om 


xc 


I 
= 60 = 9.017. 


AY 
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| Box | Box 2nd 

| 

| 1 ball 1 ball 

| Let, 

P(R,) = Probability of getting a red ball from 1" box = ©. 

l 10 

| P(W;) = Probability of getting a white ball from 1* box =~ 

10 

P(R2) = Probability of getting a red ball from 2" box =3 
P(W,) = Probability of getting a white ball from 2™ box =3 


(i) The probability of getting both red balls is given by, 


P(both red balls) = PCR, M Ry) = P(R,) = P(R,) -4 ‘3 a. 2 


(ii) The probability of one is red and other is white is given by, 


P(one red and one white) = p(red ball from 1" box and white from 2™ box or white ball from 
1* box and red from 2" box) 


= PR; a) W, Or, WT) Ro) = PRN W) + PW,  R) 


; 6 5 4 3 21 
= P(R;) x P(W2) + POM) x P(Ro) = 79 * 8 * 19 * 8-40 


The probability of different colors means probability of getting ball of each color. In above 
example probability of different colors is given by, 
P(different colors) = P(One red and one white) 
(ti) The probability of getting balls of same color is given by, 

P(same colors) = P(both red or both white) = P(both red) + P (both white) 
| = P(R, 1 Ry) + P(W, 1 Wa) = P(R1) * P(R2) + POM) x PU) 
| 19 
spdeisbd-d 


i ak 
“tuple 4.27] The odds in favour that A speaks the truth are 3: 2 and conoanl itelente + es 
the truth are 5:3. In what percentage of cases are they likely to contradict and do no 


ther on an identical point? 


+ 


we 


Solng = 2 
“lution; Given, Probability that A speaks the truth = P(A) =3+2~ 5 


Probability that A does not speak truth = P (A)=342 


i 


Probability that B speaks the truth = P(B) = 5+ 


ae 
Probability that B does not speak truth = P(B)=5 +3 


19 x 100 = 47.5% 


ntage of contradiction = 49 
tradic 


pan t con t each other on an identical point 
i ot c 
Required probability that they ba #4 
‘ . 

a on P(A) p(B) [*A4 and B are independent] 
= P(A). F 
52,3 21, 190=52.5% 
ad 3+ 5x = a9 * 100 
Alternatively, : identical point 
Required probability that they do not contradict each other on an p 
1921 _ 21, 109 = 52.5%. 


— os 


= 1-49 =40 = 40 
Case II: For dependent events _ 
If A and B are two dependent events, then the probability of simultaneous happening of two 


events A and B is given by 


P(AN B)=P(A) (4) 


Similarly, P(A 9 B) = P(B) (4) 
where, P(B/A) is the conditional probability of the occurrence of event B gi i 
| I given that (if) event 
A has already occurred & P(4/B) is the conditional Probability of occurrence pene of 
event A given that event B has already occurred (happened). 
If A, B and C are three dependent events, then 

— Eas BN C)= P(A) P(B/A) P(C/A NB) 
where, B) is the conditional probabil; 
that (if) both events A and B have sree ame Panpeninig) ee 
" i appened). 

general, for n events 4,, A>, A3, Ag, +, A 

bd 3 ’ n 


We have, PAIN AN. NA 3 

Berea a en eas ») = P(A1) P(Ay/A1) P 

Note: (i) ‘If'events A and B c,es, sama BAN At) PCA ' 
fi) Pah 4306 B areindependent, then P(d/B) — p fe (Al) Ay N A301 An 
(i) PCAN 2) can also be denoted By = P(A) and P(B/A) = P(B) 


Conditional Probability 


Conditional a 
Probability is th bs 
f A and e Probability that an €vent will occur 9; has 
T given that another event 


already occurred 
: : B ar 
that (if) event B has already ‘ © two dependent events, th - 
curred is given by, > Men the conditional probability of event 4 giv" 


P(A/B) = PAN B) 
P(B) > Provided PB) 0 


Tobability of ev 


ent B gj ; 
P(B/A) = P A fa B given that (if) event A 
P(A) ? Provided P(A) >0 


Similarly the congit: 
, nditional p 
has already occurred is given >) 


a 
% > 


e 4.28 The probability that a manufacturer will a Probability 175 
see pability that he will produce “brand Y" product is Produce ‘brand X° product is 0.13, the 


eal 0.06. What is the probability that the 0.28 and the probability that he will produce 


also have produced ‘brand X°? manufacturer who has produced ‘brand Y° will 


ution: Let Xan 


d Y be the events that a manufactu i 
; rer will produce brand X and brand ¥ respectively. 


P(X) = 0.13, P(Y) = 0.28, PUXN ¥) = 0.06 °(%) 


Y 
aes _PXXNY) 0.06 8R 
Yy)~ P(X) = 0.28 = 0.214 6W 
The probability that the manufacturer who h. ' ; 
rprand X' is 0.214 © has produced ‘brand ¥ will also have produced 


frample 42 ah aie anaes 20% shudenis failed in English, 15% students failed in Mathematics 
and 10% of students fatlec’ in both English and Mathematics. A student is selected at random. If he 


failed in English, what is the probability that he also failed in Mathematics. 


sation: Let E and M be the events that denote the students failed i iled i i 
m respectively. Then ed in English and failed in Mathematics 


P(E) = 20% =0.2, P(M)= 15% =0.15 
P(EN M) = 10% =0.10 


If one student is selected at random, probability that if he failed in English then he also failed in 
Mathematics is given by 


P(MNE) 
P(MIE) = me Oh 1 


0.2 ~2 
| [Example 4.30 | What is the probability that a couple's second child will be 
(a) a boy, given that their first child was a girl. (b) a girl, given that their first child was a girl. 


Solution: Let B, and Bz be the events that denote the first child is boy and second child is also boy 
respectively. 


Similarly, G, and Gy be the events that denote that first child is girl and second child is also girl 
respectively. There are only two possibilities; either boy child or girl child. Then 

Probability of first boy child, P(B,) = 1/2, Probability of first girl child, P(Gi) = 1/2 
(a) Probability of being second child a boy given that their first child was a girl 


P(B,/ Gi) = a = a = ; ['- By and G, are independent] 
1 


(b) Probability of being second child a girl given that their first child was a girl, 


PIG. NG) _ PUG) PG) _ t 
| Gi Gy =a PG) 


[: First birth and second birth are independent. ] 


from the bag one 
xample 4.31] A bag contains 8 red and 6 white balls. Two balls shee naa 
after other without replacement. Find the probability that both balls 


6 
“lution: Probability of drawing a white ballin the first cro ~ PUN) = 34 


n ball is white = 


and Statistics 
e second draw giv 
Since drawn is ma ment. (Le. dependen 
| : Required probability of dr. both iat 
PCW, 0 Wa) = POM) Dare t 


wing an ace, 


en that first draw 


A Textbook of Probability 
Prob. of drawing @ white ball in th ces 
de without replace 


awing 


a king and a queen in that order from a Pack 
of 


| xample 4.32) Find the probability of dra : F 
secutive draws, cards drawing not being replaced. 


cards in three con 
nace, a king and a queen respectively. 
een in that order in three consecutive draws a 
> en 


ote the event of drawing 2 
ck of cards is given by 


nd a qu 


Solution: Let A, K and Q den 
lacement) in the pa 


The probability of drawing an ace, a king a 
the drawn cards is not replaced (without rep 
P(AN KN Q)=P(A)* P(K/A) x P(Q/A NK) 


Where, P(A) = Probability of drawing an ace card = 32 


ao 


P(K/A) = Probability of drawing a king card given that an ace has been already drawn 
51 


P(QIA NK) = 
a K) = Prob. of drawing a queen card given that an ace and a king have b 
© deen 


| already drawn -4 

|  PAANKNQ)= 

(4. KN Q) = P(A) x P(K/A) x PQIA NK) = 4% 554 = 
52°51 50 8788 


were found. 


Total respondents 


In favour of both A and B 


in fay 

woul Our fi 

dhave the highest we 2 Which is slight!) 
8 est Chance to win incoming 


uM 


s 
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tw 
copie 58 if A and B are two events such that P(A) = 0.4, P(B) = 0.6 and P(B/A) = 0.5, find PIB 
= VO, Tn ) 


od PAU 3) 
olution: P(A) =0.4, P(B)=0.6, P(B/A) = 0.5 
siete P(BIA) = rie = 05-408) 
P(AN B) =0.5 x 0.4=0.2 
seek P(AIB) = aa 3 = 0.334 


P(A U B) = P(A) + P(B)- P(A 


M B)=0.4 + 0.6-0.2=0.8. 


Example 4.35 A bag contains 5 white and 3 black balls. Two balls are drawn at random one after the 
other without replacement. Find the probability that both balls drawn are (i) white, (ii) black, (ii1) 
different colors (or one white and one black) (iv) same colors. 


solution: (i) Let W, = Event of drawing white ball in the first draw 
W, = Event of drawing white ball in the second draw 
i 5 White 
The probability of drawing a white ball in the first draw is P(W,) = 2 3 Black 


The probability of drawing a white ball in 
ball drawn is white is 


4 
P(W/W)) a 


the second draw given that the first 


Thus, the probability that both balls drawn are white is given by 


P(W, W2) = P(W1) x PW Wi) = 3x4 == 


(Since, drawn is made without replacement 


5 


i.e.W, & W are dependent events) 


(ii) Let B, = Event of drawing black ball in the first draw 


B, = Event of drawing black 


The probability of drawing a black ball in the 


The prob. of drawing a black ball in the seco 


2 
P(B,/B;) = 4 


The probability that both balls drawn are black 


ball in the second draw 
first draw is P(B) = 2 


nd draw given that the first ball drawn is black is 


is given by 


3 Qe. 3 
P(By NM Bp) = P(B1) * PBA/BI) = 97 ~ 38 


. (Since, drawn is made without re 
(iii) The probability of different colors is given 


P(different colors) = P (one white an 
= P (first drawn 


ball is black and second drawn ball is white) 


placement i.e. Wi 


ball is white and second d 


& W> are dependent events) 


by 


d one black) 
rawn ball is black or first drawn 


fi Statistics for BC: 
178 A Textbook of Probability and S'0 WB, or, Bi 1 W2) = POM By) + PBN W2) 
= P(W, Boe 


)x P(B,/W,) + P(Bi) x P(W2/B)) 


=P(W, 
Boulet 


The probability of same colors is given by 
P(same colors) 
= P (both are white or both are black) 


= P (both are white) + P (both are black) 
= P(W,0 W) + P(B, A Bo) 
= P(W,) x P(W2/W,) + P(Bi) x P(B2/Bi) 


Example 4.36} A bag contains 5 white and 8 red balls. Two drawing of 3 balls are made such that (i) the 
balls are replaced before the second trial and (ii) the balls are not replaced before the second trial 
Find the probability that the first drawing will give 3 white and the second 3 red balls in each case, 

Solution: Total number of balls = 5 + 8 = 13 balls 


Let A and B denote the events of drawing 3 white balls is the first draw and 3 red balls in the second 
draw respectively. 


5 8 
Go 10 56 
P(A B)=P(A) x P(B) = pt 
) = P(A) = P(B) Tete 286 * 396 = 0.0068 


(1) Without replacement : If the balls drawn in the first draw ar 


Where,  P(4) = 


8 


P(B/A) = Prob of drawing 3 red balls from the bag containing 2 i O 
ining 2 white and 8 red balls =, 


) the 
trial, 
se. 


Sond 


» the 
rhite 


bag 
hite 


a 


xX +i ; Probability 179 
Fsample 4,37] An urn X contains 3 white balls and 5 black balls. Anoth 
and 4 black balls. A ball is transferred from the omer urn Y contains 6 white balls 


ae ie u 
Find the probability that it will be a white ball. mm 4to um Y and then a ball is taken from urn Y. 


. 
3 White Transferred a ball 
5 Black from urn X to Y 
—_—_—_——> 


Um X 


colution: 


P (White ball) =? 


ack ball from urn X respectively. 


Let W, and B; be the events drawing a white ball anda bl 


3. 3 
Then, PCW) = 375 =% PB) =525=3 


Let, Wbe the event of drawing a white ball from the um Y after transferring a ball from urn X to um Y. 
Probability of drawing white ball ftom um Y given that a white ball is transferred from um X'to um Yis 


_ _6+1 Lacie 
PORMA)= (6+1)+4 °° 11 
Probability of drawing white ball from urn Y given that a black ball is transferred from urn X to 
um Y is 
6 6 


PCWIB1) = 6444) = 11 
Hence, probability of drawing a white ball from urn Y after transferring a ball from um X to urn 
Y is given by 
P(W) = P(W,0 W) + PBN WW) 


3.7 ,5 
= P(W,) x P(WIW,) + P(Bi) x PWIBs) =%* 7, +3 * T= 38 


4.5 Probability Distribution 
Pe hei Moai ge hk Ve ‘ott ili i i iti thematical 
Probability distribution is the distribution of probability with certain condition, ma ic 
expression od lasteal consideration. There are two types of probability distribution; discrete probability 
distribution and continuous probability distribution. 


4.5.1 Binomial Distribution 


_ Binomial distribution is widely used La 
introduced in 1705 by James Bernoulli. The binomial Si 
that a particular event will occur in a sequence of distribution w 


ty distribution of discrete random variable. It was 
| distribution describes the possible number of time 
hich have four conditions: 


45.2 Condition of Binomial ca aac ‘to: given and number of expression of random 

1. The number of trial should be fixed and finite, SIV 

variable greater than the trial. , fective and non- 
2. There is dichotomous case (yes, Hy (passed, failed), defec 

bad) [p + q = 1] yes (1) No (0), (0, ‘al constant and d 
3. Probability is fixed, finite i.¢., probability of success in each trial con: 

from to trial. 
4. Random variables are independent to ca 


defective, good and 


oes n ot change 


ch other. 


| Function 
kes the probability mass 


n. rf 
ee i ieee 


function; 


n of Binomia 
distribution ta 


4.5.3 Definitio 


A discrete random 


Probability [PW] ="C,p @ 
ultaneously what is the probability of getting a 


‘ d sim 
d coins are tosse 
least one heads. 


38 | (i) Four unbiase oS 
(i) least two heads and (iii) at 


heads and 2 tails (ii) At 
coins =n = 10 (Unbiased coins) 
l 


Probability of getting = (P)=4 


Solution: Number of 


: 1 
Probability of getting tail in each coin = (4) = 


Let r be the number of heads 
Then probability of getting 7 heads is 


If r= 2 then Head =2, tail = 2. For four 
2 4-2 
so, pr=2%'(1) ) = tt) ee ty 16.8 
Gel ay Sel ae 1 Qs 1 = 3 = 0.375 


ii) P(X22) =At least two head = 1 -[P(0)+P(1)] 
0 


[6 @ eG) @ }-1-@ arn =e 
ory cnmer-e Qf ele 
Ol aye 16 = 16 


4.5.4 isti i 

ppesansietei of Binomial Distribution 

: inomial distribution have 
fal two 
probabilities of the random a as ee 
In Binomial distributi — 
ution, mean is alw 
. . . . i . 

Binomial distribution is symmetrical i os erie 
Bin . * é . P ~ 2. 
‘ omial distribution have positi 
skewed if p > 1/2. ive skewness if p < 1/2. Bi 

! | r 1 . . . . ad j 
In Binomial distribution, mean a ie sll 

) = np A 


E: 
xpected frequency of bibéieda Variance = npq. 
istr 


ibution 


Thus, the 
called fitting of 


sup pose a random experiment consists of '' teats Probability 18) 


3 Satisfyin th we 

s . 
tt 4N times, then the expected frequency of getting exactly : ee Conditions of binomial distribution 
0 cesses 


IM =NXPX=nN=Nnxe pq’ is given by 
Ss r q 
pected frequencies for different values of + = 0,1.2 


The & re 
” e 
No. of success (X= 7) PX= 1) ="C, pi gi er a 
P(X=0)="C 0 n-0 
oa f(0) = Nx P(X=0) 
P(X= 1) = "Cp! q’ 
P(X=2)="C, P q’? 


f(2)=Nx P(X=2) 


= P(X= rn ="C,p' gq" f=Nx P(X=r) 
If the probability of success 'p' is not known, then first of all we calculate the mean of the given 


Ufx bond 
N and equating it to the mean of the Binomial distribution np. That is, np 


- X. Thus, the probability of success 'p' is estimated as p -« and then probability of failure (g) = 1 — p. 


frequency distribution as X = 


Example 4.39 | Five coins are tossed 3200 times, find the expected frequencies of the distribution of heads and 
tails and tabulate the results. Calculate the mean number of success and standard deviation. 


Solution: Given (7) = 5 
Total number of trials (N) = 3200, Probability of getting or success (p) = 0.5 
Probability of getting tail or failure (q) = 1-0.5 =0.5 
The expected or theoretical of getting '7’ heads is deno 
f(r) =Nx P(X= ry=NX"C,p gq’ 7; r=0, 1,2, +m. 
= 3200 x °C, x (0.5)' * (0.5)°-5 7 =0, 1,2, ++ 5. 


ted by f(r) and is given by 


Now, 
When r = 0, f(0) = 3200 * 5C, x (0.5)” * (0.5) = 100 
When r = 1, f(1) = 3200 * 5C, x (0.5)! x (0.5)° | = 500 
When r = 2, f(2) = 3200 * 5C, x (0.5) * (0.5)°- = 1000 
When r = 3, f(3) = 3200 * 5C, x (0.5) * sy = 1000 
When r= 4, f(4) = 3200 * °Cs * (0.5)" * aay 500 
When r= 5, f(5) = 3200 * °Cs * asyrosy =i" 


eit ‘Asis 
Hence, the expected frequency distribution of heads and tails 1 


wh 5=2.5 
Again, Mean number of success = "p = 5x0 


x 0.5 = 1.118 
Standard deviation = V71P9 = af5 x 0.5 


= istics for BCA 
Probability and Stats : ; 
182_A Textbook of d the number of heads noted. The experiment is TePeatey ; 


‘ns are tossed an 
ar wing distribution 1s obtained 


times and the follo 


No. of heads 


- somial distribution assuming 7 
maa oe ii) The nature of the coin is not known 
i) The coins is unbias 


Solution: 
i) | When the coin is unbiased: 


Number of coins (”) = 7, Probability of getting head or success (p)=05 
0.5=0.5 


he - . . = 1 — 
Probability of getting tail or failure (q) . 
salete g 'r' heads is denoted by 


The expected or theoretical frequency of gettin 
f(N= Nx P(X=r) = N* "Cp gq’ ';r=9, 1, 2, --, 7. 
= 128 x 7C, (0.5) x (0.5)’-" 37 =9, 1,2, 7- 
Now, 
When r = 0, f(0) = 128 x 7Cp x (0.5)° x (0.5) 
When r= 1, f(1) = 128 x 7C, x (0.5)! x 0.5)’ '=7 
When r = 2, f(2) = 128 x 7C, x (0.5) x (0.5)’-?=21 
When r = 3, f(3) = 128 x 7C; x (0.5)° x (0.5)’ 7 =35 
When r= 4, f(4) = 128 x ’C; x (0.5)* x (0.5)'~* = 35 
When r= 5, f(5) = 128 x Cs x (0.5)" * (0.5)'~* = 21 
When r = 6, f(6) = 128 x "Cg x (0.5)$ x (0.5)'" =7 
When r = 7, f(7) = 128 x ’C, x (0.5)’ x (0.5)'-7=1 
Hence, the expected frequency distribution of heads and tails is 
(No. ofheads@) [0 | 1 
ii) When the nature of the coin is not known: 
Number of trails (NV) = 128 


Calculate of mean of the distribution 


No. of heads (X) 
0 


7-0] 


‘se 


=, _2fX 433 
Mean (4) =" = 19g = 3.3828, mp. 7 _X 33808 
Probability of getting head or Success (p) = 9), ~ oo 
4 
Probability of getting tail or failure (q)=1 


Now, r=0,1,2,.. 


When r= 0, f(0) = 128 x 7c, x (0.52)? x (0.5)"-° =| 
When r= 1, f(1) = 128 xc, x (0.52)! x (0.5)! =7 
When r= 2, f(2) = 128 x ’C, x (0.52)? x (0,5)7-2 
When r = 3, f(3) = 128 x ’¢, x 0.52) . es es 
When r = 4, f(4) = 128 x 7C, x . 4 — ee 

4 * (0.52)* x (0.5)’~4 = 35 
When r = 5, f(5) = 128 x 'C5 x (0.52)° x (0.5)"-5 = 21 
When r = 6, f(6) = 128 x "Cg x (0.52)° x (0.5)’-6=7 
When r = 7, f(7) = 128 = ’C, x (0.52)’ x (0.5)’~7=1 

Hence, the expected frequency oe of heads and tails is 


No. of defective (X) 
Observed Frequency 
Expected Frequency 


Mean and variance of the number of defective in the sample is 
Mean = 7 x p=7 x 0.5 =3.5, Variance =n x px gq=7*0.5 * 0.5 = 1.75 


xample 4.41 | If hens of a certain breed normally lay eggs on 5 days a week in an average, find how 
ultry keeper with 5 hens of this breed will expect to 


aay days during a season of 100 days a po 
Teceive (a) no eggs (b) exactly 2 eggs (c) at least 
Solution: Here, Number of hens (1) = 5, Total number of days (N) 


4 eggs (d) at most | eggs? 


2 
Probability of laying eggs OF failure (q) =1-7=7 
Let X denote the number eggs, then using binomial distributio 
eggs is given by 
PX an) ="Cop 5791230" 


r 5-r 
3 2) : =0 1 2, +5 5: 
=*c, (5) (- 30 9%? 
5-0 


_ es (3) (?) = 0.0019 
he probability of receiving no eg8s = 
ggs out of 100 days is 


f(0) =N x P(X=0)= 


ion, the probability of exactly 7 


sing CX 
sa4_A ity of ECENIPE 2 ray? _ 09,1190 | 
‘Phe probab y 5\ (4 is 
i) er px=2) = °C) E Y out of 100 days 
on er “tl e gs 2 —4 12 
., of days for exae : x 0.1190 = Lh? 
ne expected number © piX=2)= 100 
The ex " =N* ~ 1 
fQ) i 4 eggs 1S 
bability of receiving a 5 p(x=5) 
iii) The pro axed) = P= )* 


Appt 5 0, sioner ara 
IQ vo) 
= Cy 7 a 
most | egg out of 10 og 
= 100 x 0.0257 = 257 2: 


. 1/5, Two bombs are enough to destroy g 
rget 1s . 


itti as i is destroyed. 

vty of a bomb hitting 4 at that the bridge is 

ae caer to the bridge. Find the probability 6 
bridge. If 6 bombs are 


00 days is 
pected number of days for at 


The ex xs) =nxP(XS$1) 


L a4 (since, p+ =!) 
Solution: n = 6, P=5.9 =5 (since, p+ 4 


ly 4) 
PUC=x) = PQ) ="Cxpi gl "= °Ce ) ( 
P(X2 2) =1-P(X< 2)=1-[P(X=0) + P(x = 1)] 
= | — (0.262144 - 0.3932116) = 0.345 
4.6 Poisson Distribution 


4.6.1 Introduction 


A Poisson distribution is a tool that helps to predict the 
when you know how often the event has occurred. The 
given number of events happening in a fixed interval of ti 


T of events lume. It was introduced by French Mathematiciat 
Simeon Denis Poisson in 1837 A.D. 

Unlike binomial distribution, Pois 
based Gh the ecnlig son distribution cannot be deduced on purely theoretical ground 


approximation of the binomial di 


Success for each trial is less than or equal to 0.05 eet a ne ae | 
This distribution deals with the e 


Valuation 
accidents on road", " of prob 


no. of earthquakes ina year" abilities of rare events such as " no. of - 
pn > No. of printing Mistakes per page", etc 
* Attributes of a Poisson Experj - 
A Poisson €xperiment is a Statist; ee 


lowing Properties: 


Can be c 3 
lassified aS Successes or failures. 


—_— lh SE 


5 NR eral tee EE 
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xoration jlowing notation is helpful, when we talk at 
. o ‘e out the Poisson distribution 


e: A constant equal to approximately 2.71828. (Actually 
2: The mean number of successes that occur in 


eis the base of the natural logarithm system.) 


P a specified reg; . 

i) ae g10n (parametric of ibuti 

«iy The total number of trials aac 
Hu : can 

p: The probability of success for each trial 


no. of occurrence of a given event 


\) 
463 Definition of Poisson Distribution 


Suppose We conduct a Poisson experiment, in which the average number of successes within a given 
Then, the Poisson probability is: 


parr 
fy == 


r! 
Where, 7 = The actual number of successes that result from the experiment, 


gion 1S i. 


e = Approximately equal to 2.71828 (exponential constant). 
A = Mean of Poisson distribution = Variance of Poisson distribution 


In Poisson distribution mean is equal to the variance of distribution. It is also limiting case of 
Binomial distribution in which n > ©-; p — 0 (rare cases). 


4.6.4 Real World Application of Poisson Distribution 

Whether one observes patients arriving at an emergency room, cars driving up to a gas station, bank 
customers coming to their bank or shoppers (client) being served at a cash register, the streams of such 
events typically follow the Poisson process. ‘The underlying assumption is that the events are statistically 
independent and the rate, 2 , of these events (the expected number of the events per time unit) is constant. 
The list of applications of the Poisson distribution is very long. To name just a few more: 

1. The number of mutations on a given strand of DNA per time unit. 
The number of bankruptcies that are recorded in a month. 
The number of arrivals at a car wash in one hour. 


The number of network failures per day. . ' 
The number of file server virus infection at a data center during a 24-hour 


The number of Airbus 330 aircraft engine shutdowns per LOp,008 oi hours. 
The number of asthma patient arrivals in a given hour at a walk-in clinic. 
The number of hungry persons entering Mc Donald's restaurant. 


5 i ion time. 
The number of work- related accidents over a given produehon <pizipedeakat Ga 
10. The number of birth, deaths, marriages, divorces, suicides, and homicides over 28 
5 9 


’ ion¢ is Poisson random 
Xample 4.43| The number of calls coming per minute into a hotels reservation center 1 


Variable with mean 3. 
(a) Find the probability that no calls come ina 
(b) Find the probability that at least 2 calls com 
“lution: Given Information: 
Mean (A) = 3 


period. 


we senrnrnanwns BP 


given | minute period. 
e in a given | minute period. 


NV 


pir=0) =? 
eX 
A)=" rl 
= (6%) (3°V0! = € ; 
Is come ina given ; 
n | minute period. 


1 minute period. Then, 


3 = (2.71828) -3= 0.0498 = 4.98% 


We Know. Ps 
| minute period is 4.98%. 


P(r=9) 


probability that no cal ' : 
at least 2 calls come in a give 


er of calls coming in that given 


or, 
Therefore, the 
b) Find the probability that 
Let “r” denotes the numb 
r ~ Poisson(3) 
P(r22) =? 
We know, 
P(r>2) =1-P(r<2) =1- [P(0) + P(1)] 
= 1 —[(e?).(3°v0l +E) BY 
= 1— [0.0498 + 0.1494] = 0.8008 
Therefore, the probability that at least 2 calls will come in a given 1 minute period is 80.08% 


| 4.7 Normal Distribution 
The Binomial and Poi istributi et eaten 
Freer aratarenan rege distribution so far are discrete probability distributions because of the 
the most im sii ecoraets imal P robability distribution simply called normal distribution is one of 
nets ba : continuous theoretical distributions in statistics. Because of its characteristics, most of 
g to economics, business or even in social and physical science confirm to this distribution 


problems arising in the game of chance. 


iedri It is als ; 
after Karl Friedrich Gauss (1777 - 0 called Gaussian distribution (Gaussian law of erto’s 


scribe the theory of accidet® 


mean UW, and stand ix ariable rano} 
given by ard deviation, o( : Nging between oe 


pO) =~. ak 
e peed 
vnc. 


where es 
X= values of continuous + : 

andom vari 

Vari 


able. 
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22 
==, e = 2.7183 are const: = 
n= Stants, |4= Mean of normal distribution 


o = Standard deviation of normal distribution 


| mean 1 and es ris © are called parameters of the normal distribution 
| g polically, * follows no al € with mean 1 and variance 0° is written as X~ M(t, 62) , 
aph of normal curve is symmetrical bell shaped that extends i i Aha 
: mci indefi irecti 
Ups vant bassoon aie ovhtvouytaltajing se cookie nt clo: efinitely in both directions 
m > . 


peft and right hand tail extended without touching horizontal axis and the curve is symmetrical 


| pout @ yertical line erected at the mean. 
a 
(i) P(X)29 (ii) ZP(X) = 1 (Total probability area is 1) 


X= p 
Mean = Median = Mode 


| Figure: Frequency curve of the Normal Probability Distribution 


4.7.2 Properties of Normal Distribution 


The normal probability curve with parameters 1 and o has the following properties: 


1. The curve is bell shaped and symmetrical about the line X= p (ie. z=0), Bo =3 and y2 = 0. 


. Mean, Median and Mode of the distribution are equal. 
. Area under the normal curve is unity and the mean divi 


a 

3 des it into two equal parts 0.5 for each part. 
4. The curve has a single peak, thus it is unimodal. 

5 

6 


. Since is the probability can never be negative, no portion of the curve lies below the x-axis. 


. The two tails of the normal curve extended indefinitely and never touch the horizontal axis (i.¢. the 
curve tapers off x-axis) 

at which the curve changes its deviation) of the normal curve are at X = 

distance of 0. 

tes is also normal variate i.e. if X1, X2, ---, X« are 

ws, ny and standard deviation 0), S2, -» On 

., a, are constants 1s 


1. Points of inflexion (points 
Ut, ie. they are equidistant from mean at a 


| 8. A linear combination of independent normal varia 

independent normal variates with means pi, 2» 
respectively, then their linear combination a Xi + 4% 
also a normal variate with: 


Mean (1) = a1 ba + @2 He + 
a ip 2 2 
and Variate (62) =a) O1 + 02 
jates 1 te. 
Thus, the sum (or difference) of independent normal variates 1S 4 normal variate 


9% oN . : 
* Normal distribution is a limiting case of binomia ' = 
10. Ing normal distribution the quartiles (first and third) are equidistant from median 1.¢. 


O;+ ey =2Md=2~ 


Xy- + an Xp where a) 22, ~ 


was pn 


2 2 
24 + Gn On 


| and Poisson distribution. 


as 
equal to 3 cance SS Senna deviatio, 
y 


of normal 
er the normal pro 


bability curves between the 


ene observations or 
words, the range h+ 0 covers 68.26% of the 
P(p-90<X< ut+o) = 0.6826 
Similarly P(p-20<X< Et 20) = 0.9544 | | = 
Py—30<X<p+30) = 0.9974 which Is a Xe 
“almost unity. 


4.7.3 Standard Normal Distribution 
Standard normal distribution is a special case of the normal distribution. If a random variable Y 
follows normal distribution with mean pt and standard deviation o, & 


define a random variable Z as, Vv, 
a) 


7 Ach 
(0) 


is called standard normal 


variate 


The expected value and varian 
; ce of the standar 
variate Z is given by, d normal X 


E(Z) =0, Vi(Z) = ] 


Symbolically, we wr; 

Y, We write Z ~ 

with mean 0 and sifateet NO, 1) to denote Standard no l 
‘ Tma 


ie! mae 
How to compute areas under no arlate follows normal distributi™ 


rmal Probability curve? 
unded by thej ; 
limits [q b] 


’ 


Mathematically, the area bo 


given by i i 
¥ Integrating f(x) over the "curve f (x), X- 


: axis a : ' 
and is denoteg as nd the ordinates ¥ = @ and X= 0" 


PAS X< 5) 


n 


rs 

i 

SS he 
Ley 
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ff ytation of area to the Right of the ordinate at X= 4, i.c., to find P 
{ye 47 He Hee to the right of the mean ordinate at X = “ een 
case = 
X-u a- 
when, Xam Za Go =e = 21 (8H) e 
wy 
Now,P(X> a)= P(Z > 21) = P(0<Z<e)— P(0<Z<z)) ohh 


i) 


=0.5-P0<Z<z) Reh ei 


The value of probability P(0 < Z < z)) is obtained from Z=-z2 Z=0 


the area under standard normal curve. 


+a <p, i-e., @ is to the left of the mean ordinate at X =p. 


A-M_ a-p 
o Oo 


ase (ii) 


When, X= a, Z= a7 te} 


ll 


P(Z>-z) 
P(-z <Z<0)+P(0<Z<=) 
= P(0<Z<z)+0.5 


Now, P(X> a) 


UV 


The value of probability P(0 <z < Z)) is obtained from the area under standard normal curve. 


Computation of Area to the left of the ordinate at X= b, i.e., to find PU <b) =P(Z<22) 


P(X< b) 
Case (i): b>, i.e., b is to the right of the mean ordinate at X =U 


X—-p b- 
When, X= b, Z=e b= Bad, (say) 


Now, P(X<a) = P(Z<z2) = P(-0<Z<0)+ P(0<2<2) 


=0.5+P(0<Z<2) 
The value of probability, P(0<Z<22) is obtained from the area under standard normal curve. 


Case (if): b <p, i.e., b is to the left of the mean ordin 


= b- 
When, X= er (say) 


ate at X = HL. 


Now, P(X<b) =P(Z<-22) = P(-20 <Z<0)— P(-2.<Z<0) 


=0.5—-P(0<Z<2) 
) is obtained from the area under standard normal curve. 


The value of probability P(0<Z<22 
ine the following probabilities: (i) P(Z< 


Example 4.44 | Given a standardized normal distribution, determ es. 
1.96), (ii) P(Z> 1.64), (iii) PCO<2< 234) (iv) P(Z<-1.64), (W) PZ? 0.34), (vi) PCI. 
a < 1.64), (ix) Z is less than —0.84 or greater than 


0), (wii) P(- 4), (viii) PQO-17 < Z 
arenes pide y 31.87% of all possible Z values are smaller? 


+2.08, and (x) what is the value of Z if onl 


= pl.64<2<~) 
ssa _ p< Z<)-PO< 25 ey 


= 0,50 - 0.4495 = 0.0505 


ii) PC 


iii) P(<Z<2.34)= 0.4904 


iv) P(Z<-1.64)= P(-0<Z<0)-P(-1.64<Z<0) 
= 0.50 - 0.4495 = 0.050 


%) PZ>-034) =P(-034<Z<0)+ PO<Z <0) 
= 0.1331 + 0.50 = 0,633] 


vi = 
) PHB <z< N=PO<z< 1.25) [By § 
= 0.3944 ey 


vii) P(-, 


4) 


| 


00.11<Z< 1.64) =P(0<Z<1.64)-P0<z<0 17) 
= 0.4495 — 0.0675 = 0.382 


viii) ? 


The probability of Z is less than —0.84 or greater than +2.08 is 
= P(Z <—0.84) + P(Z > 2.08) 
= [P(-» < Z <0) — P(-0.84 <Z<0)] 
+ [P(0<Z<%)-P(0<Z<2.08)] 
= [0.50 — P(0 < Z <0.48)] + [0.50 — 0.4812] 
= 0.50 — 0.2995 + 0.50 — 0.4812 = 0.2193 


ix) 


Left value of z be —z, such that only 31.87% of all values of 
zare smaller than —Z), 1.€., 
P(Z<-z;) = 0.3187 
or, P(20< Z<0)—P(21 <Z<0) = 0.3187 
0.50 — P(0< Z<z)) = 0.3187 


x) 


or, 
Or, P(O<Z<z;) = 0.50 — 0.3187 

or, P(O<Z<z\) =0.1813 

The value of probability closer to 0.1813 in Z-table at Z= 0.47. 


y= 0.47 
Hence, the required value of Z is -0.47. 


Example 4.45 | Given a normal distribution with mean 
P(X > 180), (ii) P(X < 220), (iii) P(160 <X < 240), 
10% of the values are less than the value of X? 

Solution: Here, let X be the normal variate that fol 


standard deviation (0) = 20. 


uy = 200 and 6 = 20, find the probability that (i) 
(iv) P(X > 220), (v) P(X < 180 or X > 220), (vi) 


lows normal distribution with mean (j1) = 2000 and 


i) For P(X > 180): When X = 180 
— 200 


X- 180 
je 


P(X > 180) = P(Z>-}) 
= P-L <Z<0)+PO<Z<™) 


= P(0<Z< 1) +050 —— I 
_ 0.3413 + 0.50 = 0.8413 ee 


ii) For P(X < 220): When X = 220 
y-p 220-200 _ 
eae 

aie feo ria ZV ane rUae=PE™ 


P(X < 220) = P(Z< = P 


ww) For PA> 220); When y= 220 
y- 220 - 200 
Z=" = i l 
L="5 


p(X > 220) = P(Z> 1) 
= P< Z<~)-P(0<Z< 1) 


= 0.50 - 0.3413 

= 0.1587 
v) For P(X’< 180 or X> 220): When X= 180 
zeta = 180-200 __, 


uw=200 X=220 
Z=0 Z=] 


~ 6 20 
When X= 220 "X= 180 u=200_ Tf 
7 -X=H _ 220-200 Za. Zag a= ree 
-° Se Z=1 
P(X < 180 or X> 220) 


= P(X < 180) +P(X> 220) 
=P(Z<-1)+P(Z>1) 

= r<Z<0)- PL <Z<y] 
*[PO<Z<0)-~POcze 1)] 

=[.50-P0<2<1) 


P(X = x1) = 0.10 —~—___Probatitiy 193 


= PZ <-2,)=0.19 

“a eet) ~ P(-z, <Z<0)=0.10 
=0.50- PO <Z<z,)=019 
=P0<Z<2)=0.49 — 


=-| 


ate bability close : 
The value of pro \ Ser to 0.40 in Z-table 30 
¢ S$ 0.3997 at Z= 1.28 


Zz; = 1.28 


From equation (i) 


x; — 200 
Tn —1.28 
“>. 
of, xX, = 200 - 1.28 x 20 /) 
xX, = 174.4 st 
e ga' 


or, 


Hence, the required value of XY is 174.4 


gsample 446 A Kathmandu municipality putts 10,0000 light bulbs on the streets of a city. If lives of 
bulbs follow a normal distribution with a mean of 60 days and a standard deviation of 20 days, how 
many bulbs will have to be replaced after (i) 40 days? and (ii) 80 days? 
selotion: Let X be the normal variated denoting the life of the electric bulbs that follows normal 
distribution with mean life of bulbs (1) = 60 days. 
Standard deviation of life of bulbs (o) = 20 days 
Here, total number of light bulbs (V) = 1000 
i) The probability that the bulbs will have to be replaced after 40 days is P(X > 40). 


For, X= 40 
X-p 40-60_ 
ae” 20 aot 
P(X> 40) =P(Z>-)) 
= P(-1<Z<0)+P(0<Z<*@) 
= P(0<Z<1)+0.5 


= 0.3413 + 0.5 = 0.8413 
bs that will have to be replaced after 40 days = Nx 


Hence, the expected number of bul 


P(X > 40) = 10000 x 0.8413 = 8413 | 7 
ii) The probability that the bulbs will have to be replaced after 80 days is P(X > 80). 
For, X = 80, 
_X=p _ 80-60 _, 
~ 6 20 
= P(Z>1) 
fees 0 <Z<1)=0.5 ~ 0.3413 = 0.1587 


7 — P( 
ee ai that wi aced after 80 days = N x 
oO 


1587 


ll have to be repl 


Hence, the expected number 
P(X> 80) = 10000 x 0.1587 = 


on mean and 2s : 
[Example 4.47] examp! e 4.47 Then 0 “a factory ae Rs. normal 
0 workers engaged 1 gistribution (© ee 


respectivel) ' we 
estimate: i) the reentage © 


aves of 
f the wages ‘ 
ai 400 


ard deviatior : 
ard d — 1.200 and Rs. 


ne ee vetting Wages 
f workers geting = 400 and 


rs getting W4 


ly, Assuming 


c of worke 


oo ne BS > Wi ~ etw r 
dag “ ee of workers getting ™ ages D | 
S. a . 
wn Ot Z=-2 | Z=0 
and Rs. 1400. -ag the wage of the 
-ariate denoting Y= 800 
ten: - be the normal variate ~ lows normal 
Solution: Let X be a factory that follows aos 


sis 9 
distribution with mean wages (jt) = Rs. 1200 
Standard deviation of workers (6) = Rs. 400 
umber of workers (N) = 6000 


Here, total n : 
i) The probability of workers wages above Rs. 1600 is P(X > 1600). 
For, X= 1600. 
‘ =f 
7 Xe _ 16001200 _ 
o 400 


P(X> 1600) = P(Z> 1)=P(0<Z<~)-P(0<Z< 1) 
= 0.5 — 0.3413 = 0.1587 = 15.87% 
Hence, the percentage of workers getting wages above Rs. 1600 is 15.87%. 
ii) The probability of workers getting wages between Rs. 400 to Rs. 800 is P(400 < X< 800). 
For, X = 400 
X-p 400-1200 _ 


a= ss 400 


2 


For, X = 800 
P(400 < X< 800) = P(-2<Z<-1l) =A2<Z<0)-A-1<Z<0) 
=P0<Z<2)-P(0<Z<1)= 0.4772 —0 5 
=0, — 0.3413 = 0.1359 
Hence, the number of worker i 
getting w 
X< 800) = 6000 x 0.1359 = 815.4 ae ae 


ii) The probability of workers getti 
getting w; 
For, X= 1000 g wages between Rs. 1000 to Rs. 1 


== _ 1000-1200 “Kg 
o 400. = 9.5 00 
NH, 
For, X= 1400 ‘i4qy 
xX- 
aise pee 
P(1000< X< 1400)= P05 <z<9 X= 1000 X= 1400 


: 5) Z=-05 a 
P(-0.5 <4<0)+PO<z<05) 


= PO<Z<0.5)4 PO<z<95 
= 2x P0<Z<05)= 
Hence, the number of w a 
<X< 1400) = 6000 x 


400 is P(10000 << 1400) 


20.1915 = 
orker gett; 0.3830 


Ng w: ) 
3830 = 2998 a8es between Rs. 1000 and Rs. 1400 = V* prior 


“ 
2\\ 


Solu 


industrial sewing machine 
An industn ‘ing machine uses ball bearings 


4 
es —-h. The lower and upper specification limits 
5 ine : ; imits under w ee 
and 0.76 inch, respectively. Experience er which the 4 ball be 


has indi 

0. nos iS approximately normally distributed ne Pieniol renee 
0.004 inch. What is the probability that a ball bearing will be = 
sp beween the target and the actual mean, (ii)between the lower 
ecification limit and the actual mean, (iii) above the upper 
gecification limit, (iv) below the lower specification limit. (v) 
spowe which value in diameter will 93% of the ball bearing be? 

“op: Let X be the normal variate denoting the diameters of the *%=9-75 =0.753 
hall bearings that follows normal distribution with Z=-0.75 Z=0 


that are targeted to have a diameter of 


aring can operate are 
actual diameter of the ball 
nch and a standard deviation 


Mean diameter of ball the bearings (tt) = 0.753 inch. 

Standard deviation of the ball bearings (0) = 0.004 inch. 

Here, targeted value = 0.75 . Lower specification limit = 0.74 
Upper specification limit = 0.76 

i) The probability that a ball bearing will be between the 


= 
targeted value and the actual mean is P(0.75 < X< 0.753). Race 
a 
For, a = 0.75 4 
_X-u 075-0753 _, 1. 
Z= o «ee 0.75 
a Ne IIS — | __ 
For, X=0.753, Z= = =0 X=0.753 p=0.753 
% a Z=0 Z=0 : 
P(0.75 < X< 0.753) = P(-0.75<Z<0) 


= P(0<Z<0.75)= 0.2734 


ii) The probability that a ball bearing will be between the lower specification limit and the actual 


mean is P(0.74 < X < 0.753). “oN 
X-u 0.74-0.753_ a, 
For, X=014,2= <= 0.004 ae ~ ia 
Xe _0.75-0.753 =0 p20.753 |! 
| For, ¥=0753, Z="> = 0004 zen 7016 


*. P14 <X< 0.753) = P(-3.25<Z<9) 


= P(0<Z<3.25) elas oe 
; j P(X > 0.76). 
| iii) The probability that a ball bearing will be above the upper specification limits 1° > ) 


For, X= 0.76 
Yin 0.76 — 0.753 _ 1.75 
zaet= 9004 


P(X < 0.76) = PZ? 1.75) 
= p< Z<~)-PO<4 


= 0.5 — 0.4599 = 0.0401 


< 1.75) 


n limit is PY> 0.74) 


-9.5—P(0<Z<329) 


= 0.0006 ; 
= 0.5 0.4994 0 e ball bearings, 


i re 93% of th 
vy) Letx, be the value above which there 4! 
2 
ie, P(X > m1) = 0.93 . 
X=ExX ma : 
For, X-p x, — 0.753 _ ee (i) %4, 
Z="5 = 0.004 
-z,) = 0.93 Ae 
2. P(X <x) =P(Z> 1) 
= P(-2;<Z<0)+P(0<Z<~) X=x y= 0.753 


=P(0<Z<z2) + 0.5=0.93 Z=-2| 


=P(0<Z<z,)=0.43 
The value of probability closer to 0.43 in the normal table is 0.4306 at Z = 1.48. 
“. Z; = 1.48 

From equation (i), 


x, — 0.753 - 
0.004. = ~1.48 


or, x, = 0.753 — 1.48 x 0.004 
Or, x, = 0.747] 
Hence, the required value is 0.7471. 


Example 4.49 | The marks obtained b : 
_— Y 1000 students in i aati 
distributed. If 15% of the students got less than 30 ee are known to be normally 


the mean and standard deviation of the distribution nd 10% of the students got over 90, find 
Solution: Let_X be the normal 


with mean uw and Standard 
According to question, 


PX < 30) = 0.05 


P(X> 90) = 0.10 
From equation (i), 


For, X =39 
Z=“2u _30~y 
o 21... (i) 


tee ce fn a sp OE 


PZ <-z1) = 0.15 Probability 197 


of, 
- P(-ee Z<0)— P< Z <0) =0.15 
Oh, Dp ENP Sree 13 [By symmetry] 
is PO <Z<z) =0,35 
f probability closer to 0.35 i 
The yalue 0 -39 In the normal table j 
€ is 0.3508 at Z 


m equation (iii), we get 


= 1.04. Therefore z;= 1.04 


Fro 
30-p 
os —1.04 
PD 
of, w-1.04xo =]3 sa Oy 9% 
Again, from equation (11) , Y) X1p 
For, X =90 
gosh an X= X=90 
0 (eo) 2 (v) Z= 0 Z=2) 


P(Z>z) =0.10 
P(0<Z<~)—-P(<Z<z) =0.10 
or, P(O<Z<z,) =0.40 
The value of probability closer to 0.40 in the normal table is 0.3997 at Z = 1.28. Therefore, z, = 1.28 


or, 


of, 


From equation (v), we get 


90-W | 
or, S128 or, H+ 1.28x5=90 ... (vi) 


Subtracting equation (iv) from equation (vi), we get 
2.32 ¢ = 60 => a= 25.86 
Now, substituting the value of o = 25.86 in equation (iv), we get 


u— 1.04 x 25.86 = 30 
pw = 30+ 1.04 x 25.86 


or, yw = 56.89 


Hence, the mean and standard deviation of the distri 


or, 


bution are 56.89 and 25.86 respectively. 


A. Theoretical questions: 
f the probability in decision making. 


1. 
2. 


Define probability and explain the importance © 
What do you understand by (i) equally likely (ii) mutua 


3. Explain the concept of probability from the following: 
frequency or empirical approach 


4. State the addition theorem of probability and il 


5 
6 
7 


(b) relative 
justrate it with suitable examples. 
State the addition theorem of probability and illustrate it with suitable examples. 

its limitations. 


' Give the classical definition of probability and state 1 | - 
independent an xclusive events in probability. 


(a) Mathematical or a prior approach, 


- Explain with examples the concept of d mutually e 


lly exclusive and (iii) independent events? 


. Define conditional 
|. Explain the 
. What are th 

ion. Also eX! 


_ What are the important U 
. Compare difference 
What is normal di 


. What are c 
Write the important application of normal distrib 


. Numerical and practical problems Binomial 


1. Two coins are tossed simultaneously. What is t 


sense? Give examples of depeng 
ent 


different 4PP aa e 
ial distr ution: sass 
- i inomial distribution. 


Define binomial distributt 


What are the condition of P 
ses 0 


oissOn 
f Poisson distribution? 
] and Poisson distribution. 


between binomia 
term of standard normal variate. 


stribution? Define in 


haracteristics of normal distribution? 
ution. 


he probability that (i) both are heads (ii) both are 


tails (iii) one head and one tail (iv) at least one tail. 


._ A die is thrown twice. Determine the probability of getting 


(i) th i 
e sum of two faces is 6 (ii) sum of two faces is 12. 


. A bag contains 9 red, 7 whi 
white and4 black balls. A ball is drawn at random. Find th 
: e probability of 


drawing 
(i) a white ball 
Gay aca teat (ii) not a black ball 
(iv) a white ball or a red ball or a black b 
ack ball. 


. A card is drawn at ra 
ndom from 
a pack of 52 cards. Find the probability of dr. 
of drawing (i) a black card 


(ii) not black (iii) a ki 
a king (iv) eith 
era 
an ace (vii) a queen of spade (viii) a ee 2 or a card 3 (v ) ared ora bl 
mie. a black card (vi) a red 
card of 


number (d) it is Prime numb © ticket selected j 
er. 


Fi 
me the Probability Of sel]; 
Qi) 150o0r More cars 5 


(ili) less tha 
n 300 
Cars (ii) betwe 
£n 200 and 
300 cars 


(iv) b 
StWeen 100 and 400 
Cars 


come of employees in an industrial concern Bs Probability 199 


™ . 

? In Sven below: 

= Oc) [090 50:0 [nose ree 
| 90 | 150 “| i000] 

Find the probability that an employee selected at random has 

| (i) Income below Rs.100 


(ii) Income above Rs.200 
(iii) Income between Rs. 100 and Rs. 200 


You are given below the income distribution of 1000 persons 


. (8) 
7 1500-2000] 2000-2500] 2500-3000] 3000-3500 
| Noofdays | 150 | 250 |" 300 | 100 [ 9 | 7 | 30 


Find the probability that a person selected at random has . 


(i) income below Rs. 2000 (ii) income Rs. 2500 or more 


(iii) income more than Rs.2250 (iv) income less than Rs. 2750 
(v) income between Rs. 750 and Rs. 2650. 


(b) The distribution of 500 workers of a factory according to the sex and nature of work is as follows: 
rs 


Ifa worker is chosen at random, what is the probability that the worker is (i) Male and skilled 
| (ii) Unskilled 
10. Two cards are drawn at random from a deck of well 


two cards drawn are 


(a) both red (b) both club —(c) one red and one black 


ll, From a pack of 52 cards, three cards are drawn at random. Find the chance that 


(a) They are a king , a knave and an ace. 
(b) Two are from black and one is from red cards. 
(C) All are red cards. 
| 12. Abag contains 4 red balls and 5 green balls. 3 balls are drawn 
(a) All of them are green. (b) Two of them are red 
(c) all of them are of same colour. Paine 
* Five men in a group of 20 are graduates. If3 are chosen out oe 20 7 i what is the probability 
(a) all are graduates (b) none of them is gradu 
(C) at least one of them being graduate. 
* A class consists of 40 boys and 60 girls. If two § 
Probability that 
(a) Si boys _(b) both are girls (c) one boy @ 


av 


shuffled 52 cards. Find the probability that the 


at random. What is the probability that. 


tudents are chosen at random, what will be the 


nd one girl 


_—_—_—_—— 


Zz sind P (AV B). 
» 9.35, find P (ANB), P (Aug) 


ectivel = 1 
p a and =, 


ost. The probability of husband’, 


2 ne 0 
bility of happening °°" erview for pe semne” ‘ 
rin am interes at is tbe probability that (i) both eo them will be 
f them will be selected (iv) only one of 


2 5: 
lis Z Find the probability 


ji) none © 


selected ( 
them selected. 

ty that a boy passes 
jnation 
t of manager in bank X ani 


is 0.30. He has also chance 
hat he will be selecte 


ination iS fagir 
the examination is | and that © g find 
he examination. 


of them will pass t 
d bank Y. The probability of his selection 
of selection in both banks at the same 
d in at least one of the 


18. The probabili 
that (i) both will pass the exam: 
19. A person is appiying for the pos' 


in bank X is 0.20 and in bank Y 
time with probability is 0.085. What is the probability t 


banks? 
and G denote the events that the target will hit by 


20. One shot is fired from each of the two guns G, 
the first and second guns respectively. If P (G,) = 0.6 and P (Gy) = 0.7 and G, and G 
. 1 2 are 


independent events, find the probability that 
(i) Exactly one hit is registered (ii) the target will be hit? 


21. A problem in Mathematics is gi 
athematics is given to three students A, B and C where the ch 
chances of solving it b 
y 


(ii) at least one 


11 I 
th _— _ =a 
em are 3,7 and = respectively. 


Find the probability that, 
: os of them can solve the problem 
] = of them will solve the probl 

€ problem will be solved wai 


22. The odds agai 
gainst A solyj 
14:10. What is ving a probl 
th és €m as 8: 
€ mobail hat) both ory ccs in favour of B 
nd B will s solvin 
ol g the same problem are 


it? (iii) At leas 
t one of th i 
; e 
3) WA and B are en te Problem ZOU 
solves it but B fails to solv? 


tw 
B) and P(B/ A) ° events such that P(A) 


] = 0.6 
4 and 
smd P(A By _ 2, P(AUB) = 0.58, find P(A/ 
. find P(A /B)an d 
P(B 
»P (AB) (B/A). Are A and B® 
(iv)p find 
oyak 
oe. . 2 Oss 
“atstically den "nd the pr 
i) ther Aspen di Obabilities P(A/C) and 
no ‘ - and P(C/A). 
'B will ae ae P(B) = Q 20 oe 
ii) B yi = 0.20, P(A or B)” 
ll 


Occur, given that A has 


26 


, certa : 


Probability 201 


ta school, 20% students failed in English, 15% studen 
din both English and Mathematics, A student is 


ts failed in Mathematics and 10% 
selected at random. If he failed in 


mt what is the probability that he also failed in Mathematics. 
lity that a manufacturer will produce ‘ ‘ : 

is qhe eC tail Y’ product is 0.28 be Series Leen ns Pause Claes 
* il pro me : ability that he will produce both brand is 0.06 

oat is the probability that the manufacturer who has produced ‘brand Y° will also have ae 

sprand X°? 
ig The following information was obtained concerning 1000 employees of an industrial concern. 
m Sex | 

Department 
Male | Female Total | 

Vfanufacturing 280 220 500 

| production control 175 125 300 

Quality control 115 85 200 

Total 570 430 | 1000 


11. 


= 


38. 


se 


~ 
Se 


30. 


32, 


34, 


. A box contains 4 red and 6 white balls. Two balls are drawn one after another wi 


‘Ina group of equal number of men and women, 60% 


ifan employee is chosen at random, what is the probability that 

(a) Employee chosen is male given that he belongs to production control department. 

(b) Employee chosen is female from manufacturing department. 

A bag contains 5 white and 3 black balls. Two balls are drawn a random one after the other without 
replacement. Find the probability that (i) both balls are white (ii) both are black, (iii) different 
colors, (iv) same colors. 

Two cards drawn successively one after another from a well-shuffled pack of 52 cards. If the cards 
are not replaced, find the probability that all of them are queens. 


| A lot contains 10 items of which 3 are defective. Three items are chosen from the lot at random one 


after another without replacement. Find the probability that all three are defective. 


An urn contains 5 red, 7 white, and 8 black balls. Three balls are drawn one after another without 


replacement, find the probability that they are in order of red, white and balls. 
th replacing (i.e. 


with replacement) first ball before drawing second ball. Find the probability of getting (1) both red 


(ii) different colour (iii) same colour (iv) first red and second white in order 
Bag A contains 5 white balls and 3 black balls. Another bag B contains 4 white and 5 black balls. A 
ball is transferred from bag A to the bag B. Then a ball is drawn from the urn B. Find the probability 


that it will be white ball. 
man and 80% women are employed in a 


to be employed. What is the probability that 


certain town. A person is selected at random and found 
t sors se 
he person is (i) a man (ii) a woman? 


Ina certain locality 80% of the people 
and 60% of people read both Kantipur and Gorkhapatra. A 


(i) Ifhe reads Gorkhapatra, what is the probability that he will also read Kantipur? 
(ii) Ifhe reads Kantipur, what is the probability that he will read Gorkhapatra? 
(iii) What is the probability that he read Kantipur or Gorkhapatra? 


read the Kantipur, 75% of the people read the Gorkhapatra 
people is selected at random 


37. 


38. 


39. 


i—) 


40. 


41. 


nr 


42. 


44, 


45, 


43. 


; et at least two heads (2) a 


, a 
Find t n - c) 0.1641; d) 0.069. 
i (e) e078 p) 0.05475 ©) ) 0.0625; y 
t Ans a . 


in a 
5 head: m the bo 
jacement at random fro % Find the 
nee "yefectiVe> more than two are defectives 
0.7734; 9 ic ane Pr : iv) 0.7361 v 
lot contains 500 items of W ¢ defectives, ee 7: il) 0.9298; iii) 0.2639; iv) ) 0.0703 
re apitty that ci exactly ©" pans. i) 0.199 ese are if highly valued customers, i 
p ofa firm. Sixteen of t ent, what is the probability that the 
An accountant js to audit 24 — andom with replaceric™™ [Ans, 0.988) 
n un 
ts 4 of acco : , : : 
the accountant selects y valued customer? anufacturing of an article, is 1 in 10. Find the 
in the m 


chooses at least one highl ae 
The average number of defective item ; 
probability of getting exactly 3 defectives 


atics test, Ram got 707% 0 
8 items right, 


les selected at random.[Ans: 0.0574) 


ket 0 : ‘ 
na pac ht. For a 10 item quiz calculate the 


the items rig ; 
% of "less than 3 items right.[Ams. i) 0.3828 i 


f 10 artic 


On a very long mathem 
probability that Ram will get (i) at least 


0.0015] 


10% of the DVDs manufactured by a large electronics company are defectives. A quality control 


injector selects ten DVDs from the production line. Find the probability that (i) exactly two are 
at most two are defectives; (iii) at least two are defectives; 9iv) more than two are 
[Ans: i) 0.1937; ii) 0.9298; iii) 0.2639; iv) 0.0702; v) 


defective; (ii) 
defectives: (v) less than two are defectives. 


0.7361] 
Determine the number trial of binomial distribution fro which the mean is 4 and variance is? 


| [Ans. 16] 
Fit the binomial distribution for the following data. 


[Suess [5 
Freqeney | 190 


Assume the probability of success in each case as 0.5 


ree ae 


Out of 1,000 fa 
One girls assum 
An unbiasd ¢oj 


SUteectg Children eg — 


ing that bo W 
YS and gj Many fami] 
: Sa Milie 
TIS tossed six times, p; Ually likely? * Would you expect to have two boys and 
5 : 5 

(Ans: 3 


lity obtain; 
Ming (a) €xactly 4 heads (b) no heads: 
) 
Ans. (a) 0.2344 and (b) 0.08562 


the Probab; 


ee ee! a ee a de 


Re 


t 


—— 


e 4 P; oye 
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yor oustornet accounts of a certain departmental Store have a 
4b. iandard deviation of Rs. 40. Assume that the account balances nt ners roe ee 

$ what proportion of the accounts is over Rs. 150? oa 

rn What proportion of the accounts is between Rs. 100 and Rs 150? 

i) What proportion of the accounts is between Rs. 60 and Rs. 90? | 

[Ans. (i) 22.66% (ii ili 
and Co. manufactures chrome and glass lamp m ees Rae 

‘ P manually. It requires 40 labour hours to complete 


yg 


49, 


51. 


2. 


83, 


on with a standard deviation of 10 hours. 
(i) What is the probability that it will take between 35 and 42 labour hours 
(i) What is the probability that it will take more than 48 hours? 


(Given: Zo2= 0.0793, Zos= 0.1915, Zos= 0.2881) [Ans. (i) 0.2708; (ii) 0.2119] 


, A batiker clans that the life of a regular saving account opened in his bank averages 18 months 


with a standard deviation of 6.45 months. What is the probability that: 
(i) there will still be money in a savings account between 20 to 22 months by a depositor. 
(ii) the bank will be closed (no money in the deposit) after two years ? 
(Given: Zosi= 0. 1217, Zoo2= 0.2324, Zoas= 0.3238) [Ans. (i) 0.1107; (ii) 0.1762] 


The mean weight of products is 68.22 grams with a variance of 10.8 grams. How many products ina 
batch of 1000 would you expect (i) to be over 72.0 grams (ii) between 70 and 72 grams. 


(Given Zi.1s= 0.3749, Zos9= 0.2054) [Ans, (i) 125; (ii) 170] 


The number of a group of 10000 persons was found to be normally distributed with mean Rs. 750 
n Rs.50. Find (i) the number of persons with income less than Rs. 
Rs. 800 p.m. 


[Ans. (i) 1587; (ii) 6826] 
s were found to be normally distributed with mean Rs. 520 and 


per month and standard deviatio 
700 p.m. (ii) the number of persons with income between Rs. 700 and 


Incomes of a group of 10000 person 
standard deviation Rs. 60. Find 


(i) the number of persons having income betwee 

(ii) the lowest income of richest 1000 persons. 

The heights of 1000 students follow normal distribution with p = 6 

(i) How many observations may be expected to lie between 63" and 69". 

(i) Find the height in inches beyond which 10% of the stud 
(Given: Zis= 0.4332; Zo. = 0.0398; Zo2ss= 0.1000, Z128= 0.3997) . 

The weekly wages of 1000 workmen are normally distributed around praca ee 

Standard deviation of Rs. 58. Estimate the number of workers whose weekly 

i. Between Rs. 70 and 72. ii. Between Rs. 69 and 72 

ili. More than Rs. 75 iv. Less than 63. 


Vv. More than Rs. 80 
i id workers. 
Also, estimate the lowest weekly wages of the 100 highest pat 


n Rs. 400 and Rs. 550. 
[Ans. (i) 6687; (ii) Rs. 596.80] 


6" and 0 = 2". 


ents would lie. 
[Ans. (i) 866; (ii) 68.56] 
Rs. 70 and with a 


[Ans. (i) 155; (ii) 235] 


ean was found to be 50 and Standarg 


deviation 20. 

Find the num 

| i fe 
Ul Find the value of the s' 


i securing 
of students S| 
* re exceeded by the top 


was 42 and standard deviati 
ber of students lying between ; 
pans. (i) 371; (il) 383; (iy 7 


st admin! 


n intelligence © aad 
ne students. 


55, Ina 
24, Find (i) the number of se eding , 
and $4, (iii) the value of score exce die vamately normally distributed with a mean of 75 ang 
marks is appro rade A and the bottom 25% get gra fe 


of examination 


g. a A sel 
. jati 5% 0 
standard deviation of 5. If the top siege 
F. what mark is the lowest A and what mark 1s the hig hid | 
nd to be normally distributed with mean Rs. 5099 


£20000 persons were found to 
500, find (1) lowest incom 
[Ans. (a 


in mix have a normal distributio 


eo 
) 83.72 (b) (i) Rs. 5640 (ii) Rs. 4360] 


n with a mean of §, 75 


b. Income of a group © 
and standard deviation Rs. 
income of poorest 2000 persons. 


57. The highest of 1000 cakes baked with a certa 
cm, and a standard deviation of 0.75. Find 
| the number of cakes having heights between 5 cm a 
ii. the maximum height of the flattest 200 cakes. [Ans. (i) 589 (ii) 5.12 em] 
58. Given a normal distribution with 1 =50 and o = 10, find the v i 
; alue of i 
the flattest 200 cakes. a i a os - pe wa 
ns. (i) 38.7; (ii) 67.5] 
Assume that the marks in M.B.A. examinati ; 
B.A. tion ; : 
100 of 600 students taking this examination, it arn ee 
SSiuenhistekopeaite Die pauline) , red to pass 500 of them. What should be the 
60. The mean 1.Q. intelli Ans. 303 
sigeaacs gence quotients of a lar ‘ [Ans. 
assuming that ee ge number of ch 
rs ae = distribution was normal. Find between ae ‘ “ a Lee WO 
its the 1.Q.'s of the middle 40% of 
in vt re: e€ 0 
examination, average marks secured by 400 stud [Ans. 91.52 and 108.48] 
ents iS 45 with Ss d : 
.d. of 10. Assuming the 


distribution to be norm: 
al, find (i) th 
Tange of marks within whi ae poner ste 
h ents s 
ch middle 50% of students would fier Ene leidlag el ney 
ns. (i) 97 (ii) 38.3; 51.7;314 


62. see examination 15% 
uring below 40% rien 
,) a : ie (60% marks or above), while 40% failed 
tmally distributed, estimate the mean 


nd 6.25 cm and 


59. 


\o 


61. 


of the candidates 


Standard deviation. Assuming the ma 


(Given Zo1= > = 
1 25, Los 1.28, Zo35 l 04 Z 3 } 
0 Ut, £0.15 = 0. 8 
63 % of the i S are un : 9 r 64 I n 
; item Tr Vv 
ae der 5 and 8% a 
€ Ove 6 . 1 


5) 


[Ans. 43.88, 15.50] 
64. Of a large Spat a ie 
[Ans. 18. 50, 101 


65.42, 3.29 
40 percent are between 64 and 


OnE Af EE 60 thas § 
al distribution ae In height and 
: t 


65. At a 
certain examinat; 
sh mination, 190 Mean heigh 
a : : iat 
find the ree 97% of the cease Studen Spo 7 
and the standard deri ess t we 63 appeared for th 
ation of Mark: e Paper } ed thal 
: . in stat less 
Nee A istics get | 


‘ecye ASSUMj ' 
'stribution. Ming the distribution to be norm?" 
[Ans. 20, 1 = 42.7, = 10.13) 


67- 


638. 


10 
ly 


12 
13 


Iq 


i . Si Probabili 205 
didates appeared in a certain examination paper carrying a maximum of 100 : 
mo marks. 


in 
50 0) <4 marks were normally distrib i It 
6? 4 that the Sirbuted with mean 39.5 My 
a ; 5 and st iati 
fo i the approximate number of candidates who secured a first class ties Chen oe 
! er = minimum oO 
fi ner 
ant probability of getting 5 heads in 12 tosses of a fair coin by using ta 
fle pinomial distribution 
Normal approximation to Binomial distribution. [Ans. (i) 0.1934 ;(ii) 0.1975] 
. . 31) U. 


Ten eroent of ihe 008s produced in a certain manufacturing process turn out to be defective. Find 
the probability that in a sample of 10 tools chosen at random exactly two will be defective by using 
(i) the pinomial distribution (ii) Poisson approximation to the Binomial distribution. [Ans. (i) 0.194; 


(i) 0.184] 


a 


Answers 
B Numerical and Practical Problems 
Pa ao icy 
.@ 3 (ii) 4 (iii) > (iv) r 
“fit of! Sil 
1 0) 36 (ii) 36 
ag dg eae 8 
3) 59 (ii) = (iti) 59 (iv) 1 
ae ee ‘ug 3 
4, (i) 3 (ii) > (ill) 73 (iv) 73 
(y) 1 wi) 5 (vii) S (viii) 
6 
*) 36 Gi) = 
5 (b) 39 (c) 3 5 
2 
ue 
© (@) 1.0.85, ii 0.30, ili. 0.55, iv. 0.85 (b) i. 0.48, ii. 0.16, iii, 0.36 
% (a) 1.0.71, ii. 0.12, iii. 0.16, iv. 0.825, v- 0.536 (0) i. 0.5 and ii. 0.2 
10, (q) 325 616. 
®) 326 (b) s (c) 7326 
it 
(a) 0.0029 (b) 0.3829 (c) 0.1176 
Ra 3 5 1 
) 2 (b) 74 ©) 6 
Boa 1 137 
©) ih (by a8 (c) 228 
14 (a 26 59 16 
) 16 (ists (c) 33 


, 24 : 
I (ii) 35 19. 04 > “a 
) 35 i 2 
“ : Ss 21. (3) 60 
18. (i) 10 | | 
46 (ii) 0.88 (ii) 0.76! 
ae aie 5 dent 
‘0.25 (ii) ae 4p are not indepen _ 
aan 13 (b) 334mm 2 (iv) 3 
ee 5 (ii) F 
«)@ 3 (ii) 5 
cul ? | 
(i) 0.55 (fi) 0.2857 ote 
3 ° 0.583 
ee 26. (a) | 
5 25, 0.214 mee ae 
24. 0. | 7 
27. (i) 5/14, (ii) 3.28, (iii) 15/28, (iv) 13/28 
: iv) 0.24 
= ii) 0.48 (iii) 0.52 (iv) 0.2 
31. () 0.16 a 0. ae 34. (i)0.8 (ii) 0.75 (iii) 0.95 
32. 37/80 33. (i) 3/7, (ii 


Exercise 4.3 set 
Multiple Choice Questions circle (O) the correct answer, 


1. The outcome of tossing a coin is a: 
(a) simple event 
(c) complementary event 
2. Classical probability is Measured in terms of: 
(a) an absolute value 
(C) absolute value and ratio both 
3. Probability can take values 
(a) 0 to 06 


(0) ~0t9 1 
4, Probability is expressed as: 


(a) ratio 


(b) 
(d) 


(b) 
(d) 


(d) 


mutually exclusive event 


compound event 

a ratio 

None of the above 

-l to] (d) Otol 
Percentage (d) all the above 


there js No co 


hem 
™Mmon point in between 
both the even 


ts have Only one point 


10. 


11. 


12. 


13 


14 


15 


le 


events which have no point j Probability 20 
and Bare two point in commo i 
fd n, the events 4 and 

6, ! and B are: 


entary to each other 


é complem (b) independent 
¢ mutually exclusive (d) dependent 
cisssica! probability is also known as: 
bili 
i) Laplace’ aoe od (b) mathematical probability 
o ? priori probabl ity (d) all the above 
i) outcome of a random experiment is called: 
(a) primary event (b) compound event 
() derived event (d) all the above 
9 If and B are two events, the probability of occurrence of either A or B is given as: 
@) PUA) + P(B) (b) P(AUB) (c) P(AMB) (d) P(A) P(B) 
iQ, If A and B are two events, the probability of occurrence of A and B simultaneously is given as: 
@) P(A)+ PB) (b) P(AVB) (c) P(AnB) (d) P(A) PB) 


i, The limiting relative frequency approach of probability is known as: 
(a) statistical probability(b) classical probability (c) mathematical probability 
(d) all the above 

12, The definition of statistical probability was originally given by: 


(a) De Moivre (b) Laplace (c) Von-Mises (d) Pascal 
13. The definition of a priori probability was originally given by: 
(a) De Moivre (b) Laplace (c) Von-Mises (d) Feller 


14, If itis known that an event A has occurred, the probability of an event E given A is called: 


(a) empirical probability (b) a priori probability 
(d) conditional probability 


(c) posteriori probability 
S is 1/7. The probability that at 


18. The probability of Mr R livin 
least one of them will survive 20 years hence is: 
(a) 12/35 (b) 1/35 (c) 13/35 (d) 11/35 


g 20years more ig1/5 and that of Mr 


: 11 oF re 
16, Given that P(A) = > P(B) = 2 and P(A VU B) = 2 probability, P(BIA) 1S. 
(d) none of the above 


@) 1/6 (b) 4/9 © m 
| 1, What is the minimum value of probability? (d) - 
(a) 1 (by 100 Cm. 
18, 


In binomial distribution which relation is true, 
| (a) Mean=Variance (b) Mean< variance 


* Iffor a binomial distribution b(n, p) = 4 and also 


(c) Mean 2 variance (d) Mean < variance 


the value of Pis 


p(x=2)=3 PX=3) : 
(d) 13 


3 (©) 3 
os (b) 1 3 


20, In which discre 
b) 
binomial ( are : O 
yp, MD and SD are 
the Q (d) 10:13: 47 


21. Fora normal distribution. 
(b) 10: 


§:6:7 


(a) 


(a) - aeibtiO 

92. X= NRO >) the poin ormal distribut! 
(b) to mee 
ifn = I, the distribution of X reduces to 


(a) +H 
- is a binomial variate with parameters 7” and P| 
(b) Binomial distribution 


(a) Poisson distribution 

(c) Bernoulli distribution (d) Normal distribution 
d normal curve beyond lines z 
(c) 5% 


24, The are under the standar =+ 1.96 is 
(d) 10% 


(a) 95% (b) 90% 


3d) | 9. (b) | 10. (c)[ 11. @)] 12.01 
7 - 120. (e)|21. (6) | 22. (b) | 23. (e) | 24. es 3.) 


14. (d) = 16. (c) 


RK 


nit —~N 


Sample Survey 


J Concept of Population and Sample 
population: When we think of the term « 
iol state or country. However, in statistics, th 


ion: Population or universe is t 
pefni one refers the tot: ts “e8rcgate of objects under j 
is statist : Buty OF aggregate of all uni &r In any statistical inve tigati 
i ather words, the group of items or units under the stud all units or items under investi S gation. That 
: stu 1 Sr ig estigatio - 
types of popu y (investi death ee 
rere are two tyPES OF POP lation. They are target population and mi, ong) salle population 
ne population or universe ma : eran ing population, : 
. The P pe a y be finite Or infinite. A populatio ini 
ahjects or items is known as inite population. For example, tot TA ee Luntter ot 


qumber of books in a library, total number of households in nts aa of T.U in BCA, the total 
etc. 


‘po + %” 
Population,” we usually think 


e tert 6s Fi : 
M “Population” takes a nes People in our town 


tly different Meaning 


On the other hand, a population having an infinite number of objects is called infinite population 


For example, the population of stars in the sky is an infinite populati 
, ulat pee 
Ocean, total number of trees in a forest ete. Pepulaton, toby lnnenbiargs Eekies i Rasaine 


In the statistical investigation, the investigator usually deals with the general magnitude and the 
study of variation respect to one or more characteristics relating to individual belonging to a group. The 
goup of individuals under study is called population or universe. Population is an aggregate or 
collections of objects, antmates or inanimate defined according to characteristics under study. The 
population may be finite or infinite. 

Complete enumeration of all units of the population is called census. In any statistical investigation 


complete enumeration is not practicable. In a statistical investigation the interest ccs a in the 
i iati ith respect to one or more ¢ aracteristic 

assessment magnitude and the study of variation w! 0 actesict 

mr gies iverse. Thus in statistics, population is an 


relating to individuals under study is called population or unt 

aggregate of objects under study. Saou 
It is obvious that for any statistical investigation complete emer iy) coef the people of 

impractical. For example if we want to have an idea of the average per CaP ‘scat eet 


; ‘ch is rather a Very 

the country, which ts ra’ 
Nepal, we will have to enumerate all the e se the units ate destroye d in the 
plete enumeration | dministrative 


ible. Also, ss 
se, VIZ., 
taken bec 


ause of multiplicity of cau 
the help of sampling. 
ion that is used to represent 


If population is infinite, com é 
Course of inspection, 100% inspection 1S not 
nd financial implications, time factors etc. ane 
Definition: A finite subset of statistical individuals in . . 
‘alled a sample and the number of individuals 0 4 nn 

For the purpose of determining population 
the individual in the sample only aF° oe 
*Pproximately determine or estimate the ee 
‘uff We arrive at a decision of purchasing ort oda practical i 

Sampling is quite often used in our day aking 4 
Teor sugar, wheat or any other comme ests the ¢ 
Purchase it or not. A house wife normally t 


ai 
"¢ contain the proper quantity of salt. 


that population is 


nsidered for study and analy. 
of opulation with the objective © 
, of units in the sample is known’ 

ag 


v b 


‘ch is CO 


which represents the population : 


investigating its : 
3: ing units, 
he sampling ther words, a li e 
t! own as frame. In o » a list of aj; th 


jist of 
A complete ae p opularly 


ntaining the elements of the population {, 


iii) Sampling units: 
the sampling 


t n 5, 
peas of the project. 


is often depen ; 
neat the overall design 


partially dependent upon 
5.2 Needs of Sampling 


is carried out because of following reasons 
ample study is less expensive than c 


f items under investigation. Sample study prevents the 
tructive nature. 

te population complete enumeration is not 
f collecting information. 


Sampling ; 
i) Sampling saves time and money. S ensus and gives faster result 
ii) Only if test involves the destruction 0 
items from destruction if items are of des 
iii) Only way for the infinite population. For the infini 
possible so that sample method is only the method o 
Enable to estimate sampling error. The sample estimate gives the error from the population 
v) Enable more accurate result. Sampli ; 
i . Sampling conducted by w ; : ; 
gives accurate result. : y well trained and experienced investigator 


5.3 Census and Sample Survey 


Survey is the techni i oat 
. : que of investigation by di : 
information through interview. Th y direct observation of 
iew. ; a phenome ; 
€ meaning of survey has been used in broader . ; tale: : 
nse to include the 


. re objective of the Survey are as follows: 
i upply of informati | 
iii) Explanation of a re ea re : 
eons Ption of the pheno 
menon 


Census 


The surve 
Y Carried 
Sample survey, F Out by ele F 
Y. For example Nepal I; Cling Tepresentatj 
5.4 Basic C ie dsurve, jumble of th 
General] oncept of Sampli "ey, family planning study population is known * 
erally, samplj hg Survey etc 
ut also in the Ping has bee ; 
ca Ta ne ‘ 
oe Se of daily life of hun ving not only ; 
Ousewife t Man bein Y man Sets 
th : €sts ve Or y Statist i 
at she is Cooking wt quantity *xample cal investigation and research wort 
° fe) 2 


! 
| 


| 


ult, 
the 


not 


ator 


1 of 
the 
and 


F the 


n 25 


ork 


foo 


jy A doctor tests @ drop of blood of a jay 
of the patient. 4 patient ok 


4 : Ww about 
ii) A businessperson gives order for the commod the Character 
1 Sie i MOditiac 
jn such practical decision making proce ities by 


conclusions, decisions, and findings dep 
oul egate or totality. This process of 


(ovestlé 


55 Census versus Sampling 


pefinition: iene: - a study of every unit, everyone or everything. ; 
ihe complete enumeration or count of all units of the ra i in @ population. That is, 
Hence, the complete enumeration of all units of the population is ae dae He nly 

: . Own as Census § 
The term census is used mostly in c 5 urvey, 
Onnection with Nati ; 
and other common ational population 

Censuses : censuses include agriculture census Gane hie ee 
survey etc. Censis requires more money, manpower and time : census, Industrial census 


Definition: The method or proce: 


a Census is 
population, 


: ' = selecting a sample from a population under study is called sampling, 
pane : a sors of units in a population selected to represent all units in a population. It is a 
tia enumeration because it 1s a count from part of the population. Therefore, the process (or survey) in 

which only part of the population is selected and examined to estimate the certain character of the 

population is known as sample survey. That is, the enumeration of the selected units is known as sample 
survey. Information from the sampled units is used to estimate the characteristics for the entire 
population. A sample survey will usually be less expensive than a census survey and the desired 


information will be obtained in less time. 
When and Where Sampling/Census is Appropriate: 
A sampling technique is appropriate 
a. When the universe is very large 
b. When the universe possess homo 
c. When utmost accuracy is not required 


geneous characteristics 


ive nature of testing. 


ge tae 4 i los 

d. Where census is impossible 1.¢. 19 destructive/exp 
A census is appropriate when 
a. The universe is small 
b. The population is heterogeneous — ‘ 
¢. Hundred percent accuracy 16 require 

. . e€ 
d. The population frame !S incomplet 
Demerits of Sampling Technique: i) Misleading conclu ta 
. ae 
1) Less accuracy y) Whea se tp population is requir 
iii) Need of specialized knowledge - of each and ae 
V) It cannot be used if the information ’ sible 
De > . a xcessl¥ n po 

Merits of Census: ii) Jation, 
i) : : For infinite 
Expensiveness iv) 


ve ‘ 108 
"!) Not applicable for destructive "°° 


i) 


ii) 


ii 


Sampling | 
e part of the population. 
5. qd less time consuming, i.e. 
ie sranpowel and time. 
on the universe is large. 
; ppropriate when _ the universe — possess 
universe is small. es : ae 
: a sae Oe. “homogeneous char. 
opulation is hetero : 
ing Survey 
5.7 Organizational Aspect Sampling 
izati ing Survey 
5.7.1 The Basic Organization Aspect Sampling 


Objective of the Survey 


The first step in sampling is to set the obj 
the survey should take care that these objective are 
terms of money, manpower and the time limit required to get 


ective of the survey in clear & simple term. The sponsor of 
appropriate regarding the available resources in 
the results of the survey. 


Population to be Sampled 
The population of objects from which sample is chosen should be defined in a clear and 
p Pp 

unambiguous terms. The definition of the population may present no problem, as when sampling a 

batch of electric light bulb in order to estimate the average length of life of a bulb. In sampling a 

population of farms, on the other hand, rules must be setup to define 

arise. These rules must be usable in practice: the e 

without much hesitation, whether or shee COCR AE vena Ok esopeamcrie ie 

ry ou u Sg 34 a uv 7 . 

case belongs to the population. The population to 


be sampled (the sampled po i h 
Nea population) should be matche i ab | 
information is desired (the target population) tched with the population about w ich 


Data to be collected 


a farm and borderline cases 


Degree of Precision des 
The results of sa 


Population has b 
e alw . 
Teduced by taking | Measured and bee Subject to men 
ar u 
usually cost time ang) @™Ples ang p. °! etTors 


P and m 10) Mme 
Te ; 0 Yu asu . : <a 
Sults is an 'Mportant ste i ~ sequent Sing Superior in rement. This uncertainty ¢4" be 


It ma S Tu . 
re +7 . 1 ’ e me ~ hs 
Y present difficulties si ® Step is 1 : "P €cificatio of d nts of measurement. But 
N be : adminig sibility Of the “Sree of precision wanted I” the 
Oler; S Tator: Perso ‘ . ld: 
S N who is eo the da 
. i : r 01s 9 the 
this Stage n stimate e unacey Long to use 


ired 
mple Surveys 


Stomed to thiy 


Making good decisions. 


uncertainty because only pat al 


» CO ‘ ; of the 
ASistent y: nking in terms 
LS 


sor of 
rces in 


ar and 
pling a 
pling a 
> cases 
> field, 
tion to 
which 


hat n° 
| to ask 


ynnaite 


vi) 


thods of Measurement = 


may be methods of measuring instrurr 
may employ a self-administered quest; n 
HCStION 


Me 


gurvey i 
question 
ordering © 
combinations 
sampling Frame 


5 with no discretion or 


of these three items. 


., obvious, as in a population of light bulbs, in which the unit 

choice of unit. In sampling the people in a town, the ie unit is th 

ata family of all persons living in the same city ce might be 
. In sampling 


ay be by mail, 


at allo 


e sugle bulb. Sometimes there is a 
an individual person, the members 


pea field, a farm or an area of land whose shape and dimension an agricultural crop, the unit might 
S are at our disposal 


The process construction of this list of sampling units, called 

practical problems. From bitter experience, samplers beige acqui ’ frame, is often one of the major 
been routinel : red a critical attitude t 

have y collected for soine purpose. Despite assurances to the contra: : a nes eee 

found to be incomplete, or partly illegible, or to contain an unknown amount of cee lists are often 

ii) Selection of proper sampling arenes 

There are varieties of plans by which the sample may be selected. For each plan that is considered rough 

rae of the se oF sample can be made from a knowledge of the degree of precision desired. The 

relative cost and time involved for each plan are also compared before making decision. 


viii) The Pretest 


ix) 


i) 


Ithas been found useful to try out the questionnaire and the 
nearly always results in improvements in the questionnaire an 


be serious on a large scale, for example, that 
Organization of field work 

In extensive survey many problems of busin 
training in the purpose of the survey and in th 
adequately supervised in their work. A proce 


invaluable. Plans must be made for handling non-respons 
he sample. 


Na : oe 
i information from certain of the units 1p 
u 
mmary and Analysis of the Data 


Th ‘ 
: is Step is to edit the completed quest 
of deleting data that are obviously errone 


Ine: j : 
the = in which answers to certain questions wer 

edit 
Meth iting process. Thereafter, the computatio 
ods of estimation may be available for the 


In t 
oh Presentation of results, it is good pract 
important estimates. One of the advantag 


¢m 
ade, although they have to be severely qU 


ess ad 
e method of measure 


dure fr early checkin 


t 


jonnaires, 
ous. Decis! 


a 


same data. 


jified if t 


in the ho 
ons about comp 


ministration are met. the pers 
ment to be employe 


g of the quality 
e, that is, the failure of the 


e omitted by 


pe of amending 


e amount of e 


uting proced 


field methods on a small scale. This 
d may reveal other troubles that will 


the cost will be much greater than expected. 


onnel must receive 


d and must be 


of the returns is 
enumerator to 


Speci 
Selection of sampling 
Determine of the sample size 
Specify the sampling plan 
Select the sample 

Pretest the tool/s 


Step 11 


Step12 : 
Step 13. : Summary and analysis of the data. 


5,8 Questionnaire Design 

Definition: The group of question for any cenwemeines "es 

Questionnaire design depends on the type of information that is required to be collected. Qualitative 

questionnaire are used when there is a need to collect exploratory information or providing of disproving 

a hypothesis. Quantitative questionnaires as used when to validate or test any previously generated 
hypothesis. The characteristics of questionnaires are given below: 

¢ Uniformity of the questionnaire: These questionnaires are very useful to collect 

demographic information, personal opinions, facts or attitudes from respondents. The most 

important characteristic of questionnaires is that it is standardized and uniform. Each and 

every respondent faces the same questions. This type of questionnaire helps in data 


collection and statistical analysis. For e i 1 
BCA, quality of cell phone oni more. crane re eee 


¢ Exploratory: For the collection of 


ty known by questionnaire 


ualitati ‘ 
qualitative data, the questionnaire could be exploratory 


n be used in thi i ire. For 
th S$ questionnaire. 
ae aianies boy to understand his expenditure. 

10nnai : 
umber of reaponai. Thee follows a structured flow of 


: Uestio i S : 
classification nee Ms, transition questions diay. shee of questions are scree 
: estions, difficult questions a” 


Itself is a com is a multiste 
Plicated pro ©P process that . 
© based on vatied 1g - eS, designing a que "equites attention to q in 
Researchers Pics ( not all at once) with teeta equally ¢ ane Aachen: 
are alwa Varied deta; Omplicated b mig 
oils down 4 h ys hoping that th etails ecause a survey 
complicated, there ; Bood or bad ig ev” eSPonse they get, y: 
A su $a fair chance that th ta that one > Yields a good data. At end of the day t all 
proce TVEY creator may admin; ©Tespondents m; Teceives thro h Vesnbecnsiic ie (00 
88 to get a ister mi ugh these surveys. If it 's 


get 
Confused and is unable to respond apt! 


r toa 
Prehend'; "éent in that ous Soup during the developme! 
M the initial a US gtoup responds. Pre-testing 15 
Ses if there are any changes requ!” 


st¢l 


pr 


th; 


ave gin Questionnaire Design 
identif what you want to cover ; 
Na questio 


t grind your words. 


ste 
Dos" naire, 
Ask only one question at a time 


Be flexible with your options. 
open-ended or closed-ended question, it 
. Sa tou 


It iS important to know your au dience gh Choice. 


Pa YY FSF LY HY 


Choosing the right tool is important 
riaciple of Questionnaire 
in principle of i i 
she main princip questionnaire thought the following 
Sensitive points: 


() Clear and simple word i 
(ii) Avoid lengthy questions 


(iil) Ambiguous and vague words (iv) Bi 
. . 1 : 
(vy) Leading questions bes ased words and implication language 
(vii) Negative questions ouble-barreled questions 
A questionnaire aims to collect information from a respondent. Th ; 
anded aan and open ended questions; long form questions se i = are a mix of close 
saborate on their thoughts. Th isti ; ee We: abil fe 
ene gh e Statistical Society of London first developed qu a Heiser cits to 
estionnaire i : : 
a ee . is used for research purposes, which can be both qualitative as well as quantitati 
ate q a onnaire may or may not be delivered in the form of surve b econ 
onsists of questionnaire. i 
Examples: 
1. Customer Satisfaction Questionnaire 


2. Product Use Satisfaction Questionnaire 
ation Questionnaire etc. 


ve that they are typically che 
d to compile data. They are 
and respond to them. 


3. Company Communications Evalu 
aper to execute than surveys 


4 rane questionnaire examples help pro 
often have standardized answers that are use limited by the fact through 


that 
the respondent must be able to read all the questions 


T 
Yes of Questionnaire 


No ; | 
W We discuss structured or Unstructured Questio 
. . 
Structured questionnaires 
gned to collect very spec? 


and checks previo 


nnaire. 
“ative data. The 


S 
‘ructured Questionnaire: 
ue is planned and desi 
ormal enquiry, supplements data 
Prior hypothesis. tects 
? ‘ c 
mnaire "9 : questi ns but nothing that 


enerally 2 "yes/no" close-endey 


‘ -. validation. It is the eas; 
whi re the Te onde e dichotom . je f need of basic Slest 
uestions: sed in C8 
-notomous QUES’ erally U 
° ate his quest! is 8 cl osed-ended question type in which 
: a uestionnall ; - .e questions are ; ah aie 
ome Multiple-choice VP le choice question) or many Ulti-seleg 
5 uestions ultiple = 
+ aE a select one (single select 7 a of options. The multiple choice question jg 
a respondent has om a given I! gr answer; snpokreen alswentlat 


multiple choice 4 
; a : 

consisted of an 1co ae 

alternatives and distracters. Not all questi 


can be used as deemed fit or that best matches t 


stions: This type is widely used in 
nominal, or 


he expected outcome of the questions. 


scaling questions. These questions are based on 
dinal, interval and ratio. Some question types 
les are rank order questions, Likert Scale 
ale Questions. 


» Scaling Que 
the principle of 4 measurement scales- 


that utilize the fundamental properties of these sca 
questions, Semantic Differential Scale Questions and Stapel Sc 


* Pictorial Questions: This question type is a second easiest type of questionnaire question. 
Respondents are given the options from certain images limiting their response to the options in 
the questions but increasing the number of responses. 


' Types of Questionnaires Based on Distribution 


Telephone Questionnai 
onnaire: [n thj 
Collect responses, In thi Mik pe 
of time should i. "us tnethod, the responses are quick. H €s a phone call to a respondent '° 
given for much j : SS Flawe 
a questionnaire. The ony ch information Over the phon 
€. Iti 


: ple also ma 
ieaaiic hciiteiara: Y not be a representative of th 


& € of the Tespondents. The type is conducted by dis 
Carc 


1 a natur, 
I ] adva : 
a! and comfy ble tate. of this type of attest 
Ment and j estio 
in-d 


r, the disadvantage is that a !0! 
S expensive way of conducting 
€ whole population. 

her that visits the home the work 
nnaire is that the respondent is" 


€pt dat 
; a can be collected. The disadvantae’ 
1Onnaires 
*thod Ting ob 
s . adit 
filled in &S a rege id Olete but stil] being used in so™ 


fisure Sent back a “ending a physical questionn® 
‘nd hence nee © advantage of this method is that 
Sive pee truthfully and entirely: 


Ime Cc : ‘ohh 
es °nsuming. There is also 4 hie 


ao 


teristics of a Good Questionn i 
ae , aire 
Questionnaire should deal with 


im fe) 
: : ondents. POMARE: 49 Signif; 
resp Icant to 
Dnaiy, l PIC tO create int 
€ sg should seek only that data which cann orice erest_ among 
‘ ained f, 
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: . i Prehengj 
 CaSigg, aot should be attractive. ve. 
: Directions should be clear and complete. 
1 Whi Id be represented in 
‘ hich » ‘It shou p 800d psychological order proceedi 
Select a apa ing form general to more specific 
+S ; eee , 
oe is » Double negatives in questions should be avoided 
'® Close . f : ; : 
delines . Putting two questions mn One question also should be avoided, E . 
obtain only one specific information ‘ EVEry question should seek to 
ased on ¢ Itshould avoid annoying or embarrassing questions, 
! It should be designed to collect i i : 
N types : 8 nformation which can be used subsequently as data for analysis. 
t Scale ¢ It should consist of a written list of question. 
¢ The questionnaire should contain questions which can be answered in minimum of writing. 
lest : . 
ee ¢ Why questions should be included as supplementary questions with choice answer. 
s in : 
¢ The question's arrangement should be simple to tabulate. 
¢ Some of the bad questions should be avoided. Some examples of bad questions are too long, 
complex, person, ambiguous, leading, non-relevant, embarrassing etc. 
yr other * Some of the word should be avoided and some precautions should be taken in using such words. 
tage of The following are some of the words (which are difficult to understand by the respondents) that 
> not in should be avoided as far as studies. 
ous Advantages of Questionnaires 5 
en : é . financial resource. 
lot * Practical and less expansive can save time, human resource and cane 
a , : ted from a large number of people in a short period o 
Jucting Large amount of information can be collecte 
Se ber of people with limited affect to its validity 
: er 0 
e work ’ a be carried out by the researcher by any num P 
a and reliabili : : by either a 
nt is iD ; bility. . ally be quickly and easily quantified by 
vantage The results of the questionnaires can ae s 
Tesearcher or through the use of software packaee- ther forms of research. 
6 Cc J " d objectively than 0 h and may be 
, some an be analyzed more "scientifically" an¢ 0D) are and contrast other researc : 
. : ® rer Fi ed to comp: 
»nnaile When data has been quantified, it can be us cteaheite oe 
jg tha! "sed to measure change. d to create a new theories 2 
‘ ae ae use 
/. TH Positivist believe that quantitative data can be 
a bigh hypotheses, pondents. 
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5.9 Principles of Sample Survey 


The sampling theory is based on the following principles: 


Principle of Statistical Regularity ve sd 

The principle of statistical regularity is derived from the theory of probability in mathematics. 
According to this principle, when a large number of items is selected at random from the universe then it 
is likely to possesses the same characteristics as that of the entire population. 

This principle asserts that the sample selection is random, i.e. ev 
chance of being selected, It believed that 
representative of the population, Thus, thi 
random selection of a representative sampl 


ery item has an equal and likely 
sample selected randomly and not deliberately acts as a tne 


8 principle is characterized by the large sample size and the 
@, 


Principle of "Inertia of Large Numbers’ 
The principle of inertia of Ja 


accurate the co; ea . . 

stable in their nae me to be. This Principle is a larger the size of the sample the mor 
ics s be 

insignificant. It does n, than the smal numbers and on the notio 


n that large number are more 
the aggregate of large number }s 
numbers there is but is less than in the 


Variation in 
€ large 
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the results obtaj ¢ Principles talk ab 
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This pring} Zation alned b Ne techni © about the parameters ¢ 
design With Ple make ay - WMiques of probability sampling: 
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“a S Provided by the total expe! 

CONSists of J 


Of Cost 


| 
Conducted on the target audience 
Close-ended and ve 


naire, the 
IS not 
ample Selection an “nati 
510 Sample Se 7 d Determination of Sample Size 
In sample technique the size of sample should be large enough to gi fid 
idth. i i Ive a confidence interval 
referred width. To Seat the size of the sample researcher should keep in mind the Hie cae 
a) Nature of universe b points: 
) Number of classes purposed 
hematics Sameer tuay d) Types of sampling 
setherd ¢) Standard of accuracy and acceptance confidence level 
f) Availability of finance 
nd likely Fg 
as a true g) Other consideration n= (722) 
> and the 
_Where, n = Sample size, Z,, = Standard normal variate (normal table value), E = Permissible error 
(Different between sample statistic and population parameter). 
2 
the more n= (2) PQ 
aan? sof] hi ing of certain characteristics for 
umber 'S Where, P = Probability of certain characteristics, Q = Non happening 
an in the "ered data, ificance (¢) 
5 oti istical data level of significance (a 
f Hence, selection of sample dependent upon the variation of statistic a sal 
vuracy ° "ferred ; ‘ccible error is different between samp 
Doula 0 standard normal variable, and permiss! 
~ ation parameter, 
ar PASSES do m les. Sees 
rs of the Solved Ame 100 and 10. 


: COLL CS Dimes 5 Sie OR ES d to be 
Rxg le of 50 were foun ‘ 
u sat dom samp ; : _ Should no 
oe as Re that the error 1n estimate mean 

confidenc 


‘ investigator wants to be 95% , 
: uired? 
st of th tray +2. How many additional observations are req 
ae On: ; 
ovide ™ Here, sample size (n) = 50 (First taken) 


Sample mean (X) = 100 
Standard deviation (0) = 10 
(6) = 10 
Permissible error (FE) =#2 


es 
sample size (n)=\E 
Now. weer nearly. 


= 96 i 50 = 46. 
me and 5% permissible error. 


5 for all most sa 


fad 
Hence, the number 0 ility 0.3 
-_« when probability © 0.35) 
ind the sample siz° is = 0.65 (1-0. 
Ginger ese see Ts, P 2035.2 
Solution: 2 _3 for all most same. 
2 
: 3 = 819 
Z S x 0.35 x 0.65 
By formula n -() PQ= (=a) 


Required sample size is 819. 


Example 5.3 | A manufacturing concern W 


production a month by the customers. If the stan 
maximum error is not be exceed Rs. 3 with a 99% confidence. 


ants to estimate the average amount of purchase of its 
dard deviation is Rs. 10, find the sample size if the 


Solution: Standard deviation o = 10 
Level of significance = 1% 
99% level of confidence 
a =0.01 
Then, Zo01 = 2.58 
E=3 


EE) 3 y= 74 


hild Spends c =a 
watchin er the 
n the populatio . hour of the = g television ov 
lation to b 
to be 95% confident that i hours. What sample size should be 


Studies have slinw: 
taken for this 
Solution: Gi 
+ Ulven, Maximy 
M allowable 
error (£) = 


Population S.D.( +1] =] 


0) = 3 
Sample (n) =) 

level (4 ~0) = ‘ " 

Risk (a) = 0.05 


Confiden a 
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er the 
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A research workers Wisheg y. 
ample. mt ie. 


The probability is 0.95 ul 
. lat 


70 of the standard deviati the san ~ 
oe Srshalsi eres How Peg Mean wi} °F Populat 
colution: Given, Probability = 0.95 — Conte 4 Sample sti iffer fe 
‘Onfidence | d be 
a= evel y) 
a= Zo.05 = 1.96 
Error (E) = 
(2) = 25% of s4, = (i) 
Oo= 
— 100 0.25 
n= £a 9 ] 
E = (isx0 2 
0.25 x 2) = 61.47 
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timate the ~ 
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Solution: Given, Po i 
, Population standard deviati 
€eviation (6) = 5000 


Now, 


Probability = 0.95 
a = (1 — prob.) = (1 — 0.95) = 0.05 
La ee Zo.0s = 1.96 
Error (£) = 600 


2 
Sample size (”) = (3) 


1.96 x 5000)” 
n=(— 60 = 266.78 


+ n=267 
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| 
In measuring reaction time, @ psychologist estimates tha 
be taken in order to 


Sec. Ho 
- How 
large a sample of measurements 


must 


of his esti 
stimate of mean will not exceed 0.0 sec? 


Solutio 
n: Gj 
iven, Population, S.D. (a) = 9-95 


C 
onfidence level, (1 - 0) = 95% = 0.9° 


We have, 


a = 0.05, Lo = Zo,05 = 1.96 
Error (£) = 0.01 
2 0.0 
Zo 0 1.96% ;) = 96.04 
a (2) -( 0.01 


. = 96 


t the standard deviation is 0.05 
be 95% confident that the error 


222 of Probability oi Sa tion of junior executive who change their fir 
the proporiior © thin 3 percent of error and () ge ith 
: ‘red to estimate © FY mate within 3 p nd 0.95 
ane seer This proportion 1s fo gui ars ago revealed than 30% of Such ee 


lye 
ithin the first five y ducted severa Up 
within be used. A study oc a lop 


: iob withi 
their first Jo the study? 


ired to update 5 i 
at if se previous estimates are available? 
ei 


a=0.95; &= 0.05 


of confidence is to 
executives changed 
i) How largea sample is re 
ii) How large should the sample 


Solution: Given, Confidence level, 1- 


Za = Zoos = 1.96 
P = 30% = 0.3 
Q=1-P=0.7 
Error (E) = 3% = 0.03 
We have, 
oy 1.96) aaeee 
Sample size (”) = (2) PQ= (45°) x 0.3 x 0.7= 89 
*, n = 896 
(ii) Since, no previous estimate (i.e., P and Q) are available so we take P= O=0.5 
2 
Simple size (m) = (2) PQ= (re 0.5 x 0.5 = 1067.11 
“. n=1067 


A political pollster wants to estimate the proportion of voters who will vote for the 
democratic candidate in a presidential campaign. The pollster wishes to have 90% confidence thit 
her prediction is correct to within +0.04 of the population proportion. 

i) What sample size is needed? 
li) If the pollster wants to have 95% confidence. What sample size is needed? 
iii) If she wants to have 95% confidence and a sampling error of +0.03, what sample is needed? 

Solution: Given, Confidence level, (1 — ot) = 0.90 

Risk (%) = 0.10, 2. = Zp, = 1.645 
Maximum allowable error (£) = |+0.04] = 0.04 
1) Sample size (n) =? 
If previous estimate (i.e., P and Q) are not 


equall j 
We assume, qually available. 


P=Q=050 


. . 2 
“. Required sample size (n) = PQ @ =0.5x 0,5 ( ) 


Z 
oe a 
li) Confidence level, (1~a) =0,95 Risk («) = 0.05 a 


Za =Zy05 = 1.96 

e., P & Q) are not given, 
P=Q=0.50 
Required sample size (n) = PQ ( , 


= 422.82 = 423 


If previous estimates (i, 
We assume, 


=0. 1.9 . 
5* 0.5 x (G38) = 1067.11 ~ 1067 
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Population size (N) ee 
Population mean (y,) i size (n) 
‘ : am: ie 
Population standard deviation (c) m ple mean(;) 
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Population proportion (p) : mple standard deviation (s) 
é ; ample proporti 
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Population coefficient of skewness 8 Sampl : ation coefficient (r) 
; ple coefficient of sk 
Population coefficient o dea : ewness (5)) 
f Kurtosis (, ) etc. Sample coefficient of kurtosis (b2) etc. 
Standard Error (S.E.) 
r the The standard deviation of : ; satiate 
nee SE) of the statist of the sampling distribution of sample statistic is known as its standard error 
: stic. Thus, the standard error of statistic t is given by 
S.E. (f) = Vari l oe 
. a Variance (1) =*\ /7 4(¢- 
(t ; dLt-f) 
The s — cee 
r ae ee deviation of the distribution of sample mean is called the standard error of the sample 
is denoted by Og or, S.E. (X). Standard error of the mean is a measure of dispersion of the 


g distribution of proportion 
own statistics are 
P is the 


viation of the samplin 
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ard € 
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5.11 Sampling and 


jon (i.e. sample) has been — 
lation. So, the sampling errors 
de of sampling errors depends ss 
ncreases, the sampling erro, : 


art of populat 


t when 4 P 
ae bout the popu 
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se due to M1 ne 
nsus. The magnitude OF » 
le. If the sampling size ! 


Sampling errors ari 
estimate the population P 
absent in complete enume 
nature of the population a 
decrease. 

When you survey 4 sam 
you are trying to get information 
understand common sampling errors $0 Y 


Five Common Types of Sampling Errors: 
cation Error: This error occurs when the researcher does not understand who they 


about breakfast cereal consumption. Who to ssrvey) | 
ldren. The mother might make the purchase decision, but 


arameters 4 
ration survey oF C€ e 
nd size of the samp 
t the people in the sample Rath 
» Rather 


ple, your interest usually gOes beyond jus : 
, For this reason, it is importants, 


to project onto a larger population. 
ou can avoid them. 


Population Specifi 
should survey. for example, imagine a survey 
might be the entire family, the mother, or the chi 
the children influence her choice. 

- — Frame Error: A frame error occurs when the wrong sub-population used to select a sample 

. eg occurred in the 1936 presidential election between Roosevelt and Landon The 

sat ies om pena and telephone directories. In 1936, many Americans did " a 
mes, and those who did i 

Republican victory. were largely Republicans. The results wrongly predicted : 


Selection Error: This occurs 

; : when respondents self. i ; 

are interested respond. selecti eltselect their participation on t : 
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e-' 

veaiieeh oe. Pre-survey follow-up. If a response is not recei ger participation. A typ! 
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Non-s > 


Faulty planning or definitions li) Response errors 
1 x zy 

«) Non -fesponse errors iv) Compiling errors 
.) Publication errors vi) Coverage error 


Exercise 5.1 
|, What is sampling? Describe the various methods of sampling with their merits and demerits. 


; ‘cal 
1, The business manager of a large company wants to check the inventory record se rst euren 
1. The bu . 
a record against the physical inventories by a sample survey. He ane am e acai 
the se sampling error should not be more than 5% above or ie cate sais ane 
inaccurate records is estimated at 35% from past experience. Determine the samp 


\ White the sort note on organization aspect of sample aes 
‘ How do you determine the sample size in any investigation ‘ 
< What do mean by questionnaire design? 

' Write the different type of questionnaire design. 

' What ae the criteria for good questionnaire design? 


ing error. 
| I Write the different between sampling and non- ean 
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(b) Real population 
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¢ . ials | 

) Infinite population raally repeated trials 1s 


(b) finite population 
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hypothetical population (d) real population 


— jhe wrong formulae 
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( Tong Columns or cells (d) 
"8s of sample 


. rs 
None sampling ci 


istics for BCA 
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ced by a oe 
4. Sampling error in sample may D° ee (b) decreasing in the sampling size 


(a) Increasing the sample size a ee 
le size 
No charge in samp 
: ample size 1s done by formula 
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_ Determination of s 3 0) 
rs rn » 6 
(a) (2) (>) \E 
ize is done by 


6. Determination of sample si 


 ) w (2) = © PO () 1-P 


E 
: in case of 
7. There are more chances of non sampling errors than sampling errors 


(a) studies of large samples (b) complete enumeration 
(c) in efficient investigations (d) all the above 

8. Increasing in reliability and accuracy of results from a sampling study with the increase in samk 
size is known as the principle of 


(a) optimization (b) statistical regulations 

(c) law of increasing returns (d) inertia of large numbers 
9. The magnitude of the standard error of an estimate is an index of its 
a i = (b) precisim (c) efficiency (d) all the above 
nswer Ke 4) 
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ii) Non rand : ampling) 0us Sampling 
ii) om sampling (Non- Probabj]; ; 
ii) Mixed sampling ae 
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Some example of survey in practice of Nepal 
AHS: Annual Household S . 
See OUNEA oe CLS: Crops and Live stock Surve 
: Nepal Demographic and Health NLFS: Nepal Labour Force S 
; . rce Surve 
NMICS: Nepal Multiple Indicators Cluster Survey NLSS: Nepal Living Standards se 


O a 
ane ae aes scale, local government, city, state and country are making increased applied of sample 
y to obtain information needed for future planning, development for meeting pressing problems. 


6.2 Random Sampling (Probability Sampling) 


Random sampling or probability sampling is the scientific metho 
population according to some laws of chance in which each and every unit in the po 
definite pre-assigned probability of being selected in the sample. Probability — : i - 
‘Way so as to be representative of the population. There are various types of sampling : 
hance of being selected. 


dof selecting samples from the 
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Jected in such 


i. Each sample unit has an equal ¢ d 
ii. Sampling units have different probability of being ape ; a 
iii. Probability of selection of a unit is proportional to the samp 
ch a way that 
ar | selected from 
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un! 


each and every unit in the population ne 
Ty P mon ac 
ost com ynit at ¢ 
for ea¢ mplin 
3 tary random S* 


the population. It is the simplest 2N" =. ctectio 

drawn unit by unit, with equal ania ost ele 

4s the equal probability sampling: . gs can be us? 
based on the theory of expectation. sallowing sg methoes 

he fol 
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in SRSWOR is Net | 
P . [fa unit is selected and noted and then returned back or replaced back in the 
ii. SRSWR: ares making the next draw, then the sampling procedure is called simple 
es eaves with replacement (SRSWR).The probability of selecting (drawing) ofa 


sample of size n from a population of size N in SRSWR is N™ 


Merits 


(a) Each item has equal chance of being selected. So, it depends upon the chance but not on the, 
opinion, personal judgment, sequence, etc. 


(b) This method is quite economic and comparatively saves time and money. 


(c) It is more representative of the population as com 


ared to the j : ing, 
Demerits P he judgment or purposive sampling 
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tified sample is a muni-reproduction of the 
int characteristics of importance for the resea the Population is diy; 
religion, etc. Then the population is randomly sampled ae Y gender, social class een 
A stratified sampling may be either (i) Bisporieuae in each category or stratum, ation 


. ; : (li) Dis ‘ 
a proportionate stratified sampling, the number of items drawn Ret Proportionate 
ap of the strata. On the other hand, if an equal number of items are stratum is proportional to the 


: : ; drawn 
afhow the stratum represented in the population (universe), it i called ae a ae ac oo 
e sampling. 


Pp Opulation. Befor 


 sampli 
tch. For example, b pling, 


in 
vel, 


Merits: 
@) The units selected represents whole universe. 


() The estimation of population parameters is more efficient. 
() For large and heterogeneous population, stratified sampling is the best design. 


Demerits 

a) This method requires more time and cost. 

() Ifeach stratum of the population is not homogeneous the result obtained may not be reliable. 

() The samples from each stratum should be selected only by the experts or experienced persons. 
ii) Systematic Sampling: A random sampling in which only the first unit is selected at random and the 

remaining units are automatically selected according to pre-determined pattern (i.e. at fixed equal 

intervals from one another) is known as systematic sampling. Systematic sampling is a commonly 

wed technique if the complete and up-to-date sampling frame is available i.e. complete and up-to- 

date list of sampling units is available. 


: = here 1 
Suppose NV units of the population are numbered from 1 to N in some order. Let N =n k where 


N i ling is a 
: wins 8 _ 4 Systematic sampling 
‘ample size and K is an integer known as sampling interval. Thus, k= = y 


ling frame. This is 
“atistical method involving the selection of elements from an ee ieee at random and 
"dom sampling with a system. From the sampling frame, a starting P 


mple 8 houses from a 
“oice thereafter are at regular intervals. For example, wa jom sate point between 
“eet of 120 houses. 120/8=15, so every 15th house 16 © 41, 56, 71, . i 
a 1S. If the random starting point is 11, then the me popule 
fi 6. This sampling is mostly used in forest survey ia 
"© 0 fixed (i) linear method N = 7K (ii) non linear me 


suppose you 
hosen after ar 


1, 26, 
s selected are 11, 
sheries surveys etc. In general 


dN#nk. 


Crits 
(a) 
i) 


This method is simple and convenient to use. time and labour. 
In Selecting the sample by this method, it takes eee 

s 
Most of the results obtained from this method es and if the items 4° 
IPthe complete list of the population is available 


Is F i 
Method is more efficient. 


matically, 


a 


e arranged syste 
(d) 


the population jn some Cases. 
ise the result obtaj . 
ome order otherw ned 


Demerits 48 
(a) The sample elec sail 
The items of the P od or tec 


be misleading ling 1S sters, 10 
ster samp alled clu he clusters are homogenous so that the 


hnique of random sampling in which he 
-. such a Way that the characteristic. . 
ao: A Clu 

Cluster Sar ed into diffe cou pee ay same. Then a cluster is selecteq . 

cs win SEE! ac cst sOU a eeibe of random sampling is called clus 

representative of the population as a Whole 

about the economic condition of people of 

city is divided into different wards in Such a 

d is heterogeneous and between wards are 

mple random sampling method and we 
du metropolitan city. 


(iv) 


Kathmandu metropolitan city. thin wat 
that the economic condition of people w eae 
ese ; das sample by using S 
homogeneous. Then, a ward is selecte as ares 
can study about the economic condition of people in Ka 
Merits ; 
(a) Itis less costly than simple random sampling and stratified sampling 
(b) It is useful even when the sampling frame of elements may not be available. 
(c) Elements (units) selection by well-designed cluster sampling procedures is easier, faster 
y 8 
cheaper and more convenient than simple random sampling and stratified sampling. 


Demerits 
(a) The efficiency decreases with increase in cluster size. 
(b) The efficiency cost per unit may be more in cluster sampling 


(c) Enumeration of the sampling units withi 

g units within the sel is di hacen 
Nilay Seat: sen, ‘a ected clusters is difficult when the population is large. 
sampling, Mli-tage sampling i eae is a further development of the principle of clustet 
done in various stages. At first Stage, th m sampling in which sampling procedure is carried out it 


e 
called first stage units (fsu) or primary vie and 
elected clust 


(v 


— 


similariti eae: 

eas SONA wh eS, th ‘ fferel 

ae d, either in single- or om all the strata ; ey are substantially d : 
op ; Multi-stage, » Where in cluster sampling only 


0 a cron ; 
as Op 1 ‘ 7 ; ‘ 


: 1 nate 
(@ Itis mor and a plot of fixed size as the ulti™ 


€ 
(0) Iris Cie 


(C) Thi in ti : 
4) - Method is also aTRe scale Surve "Bation ig Very lar 
$ Sample size j Ore €Xible vs Be. 
S Teduceg ities ‘ @0 Other samp| 
Ch sta Plin 
Be, this g Method 
Sam 5 S. 


NB techn; 
hnique Saves time and cost. 


(ii) 


ster 


depends entirely on the discretion or j 
fl sigator. This method is mainly used for opini 
es subject to the drawbacks of prejudice and 


esi ; 
hich the choice of selection of 
enience, beliefs, 


biases of sam 
npler 
annot be recommended for general 


jing units udgment or cony 
On surveys but ¢ 


bias of the inve 


indgment of sampler or investigator. This metho 


rermmended for general use as it is subject to the sa siti 
: re 1as of the investigator. 
However, 1 » It 1s possible that judgment samplin 

: ; ma 
seful results. However, this method suffers from a serious defect that it is not aie oar 


iene of precision of estimate from the sample values. 
Types of Non-random sampling or Non-probability sampling are as follows: 


() Judgment sampling: A sampling method, in which the choice of sample items depends entirely 
upon the judgment of the investigator is called judgment sampling. In this method of sampling, the 
choice of sampling items depends exclusively on the judgment of the investigator. 

Inother words, the investigator uses self judgment in the choice and includes only those items of the 

universe which are convenience to the investigator. It is the method for quick decision. 

For instance, if we want to study of corruption in Nepalese society, we can select a sai oles 
wy : * ini ‘ect. We consider that the judggmen 

of the senior professors of T.U. to give their opinion on the subjec at the desired information. Ii 

of these professor is much superior to a convenience. Then we can g 

docs not based on theory of expectation. 

Merits: 


(@) Itis the simple method of sampling for quick decision. 


() It gives the better result when sample size is small. 


Demerits: ; 
* . 2 . onally biased. 
‘It gives unreliable conclusion if the investigator . : ae in general use. 
5) Though simple, the method is not scientific ni sf 

(c) . : ause it is no ; 
t Sampling error can not be estimated bec sampling method, 10 wh 


Conyen; A , 
*tvenience sampling (Accidental sampling) put by convente oh is totally 64S 
rese! 


“4° | ment 
“ “imple neither by probability nor by jude Selection of SAE an nadly : ling 's 
“pling. It is also called the accidental negor this me ver convenience ee a 
. ; e ’ ‘ -on-tie" 
oo of the researcher. The eee is unsatisfactory: re o conduct 4 passers-PY 
pulation, Th lly biase ; ne ired nv 
f . They are generally ue 


i ce, if an pete 
ee geet and interviews 
of the 
ed. 


fe 
in see for making pilot studies. 
Thay ews, he/she stands up in comer ° 

’Tequired information can be obtain 


(ii) 


(iv) 


a" 


? ; dies. ‘jon. : 

Merits: ing pilot S™ _- quick dee's -, ig widely used ie + - 
) Iris useful for makine ‘ sampling for a" nvenience sampling !® ¥ used Le. it ig te 
a ‘ . Se : 
: It is the simple nee rey are li nited, °° 
( . and mol 
on both HME ning: 

(c) vse ad jess time const sentative of the population. 

we re 

: rdly be reP 
Demerits: “sed by this method can hardly scone based on random sampling. 
» results obtaine ause It IS . 

(a) The resu seca oan be estimated bec sod ts researcher are given quotas to be filleg i 

Sampling © “ag metho . é 
. s pling: A non-random —— quotas the process of qui ieee Tequireg 

icp tite hin pre-assigne ; ing. Quota sampling j 
ae" fferent strata and within pre v eapling is called quota ampearr ets eee Sq LYDe 

sa es risti 

samples from these stra! as ag may be fixed according to some SP SUCS such 
of judgment sampling. Sample qu “tieal oF religious affiliation etc. 
as income group, SEX; occupation, Pp dget in radio listening survey, the interviey,, 


the fiscal year bu 
For instance, for ane choosing from different areas such as 20 officials, 10 Professors 
- aa and 5 students. Here, interviewer is free to select the people to 
10 businessmen, 

| 


interviewed for the comment. 
| 


the comment about 


Merits: 
(a) It saves time and money rather than other sampling methods. 


(b) It is stratified-cum-purposive so investigator enjoys the benefits of both. 


Demerits: 
(a) It may be biased because of the personal believes and prejudices of investigator. 


(b) Sampling error cannot be estimated because it is also not based on random sampling 


cei sata saiege sampling: Snowball sampling is a special type of non-probability 
Ww en . . . . od 
rare ane er e desired sample characteristic is very rare. Therefore, this sampling 
rit is sai Aone 2 ications where respondents are difficult to identify and are best located 
sampling. In this oie ore : also known as chain referral sampling or netwot | 
, ‘ou i 
group is discovered and then subsequent responden's 


Possessing similar characteristics identi 
hele ech are identified based on referrals provided by the init 


study, political activities, il] 


egal activiti 
Merits: £al activities ete, 


Demerits: 


(a) Itis difficult to ap 


ply wh 
(b) It does not ensure n the Population jg large 


the inclys; 
4SIOn of al] Clements ; h 
I the list 


Ws 


eR ee 


iple 


Workout f Example 
opulation variable Consists the Values: 12 34,5, 
‘a possible samples of size 'WO Which ¢ 
praw ; So ge 
) re. mean of the sampling distriby 

show tha lation size (N) = 5. Sample size ( 
ee number of samples of size 2 Which can be drawn from 
Possible NE as "eS = 10 

nt = n 

eae are (1, 2), (1, 3), (1, 4), 5)2,3),@, 72,5), 
Possible s 14+ 2434445 3 
ee 


aWN from the Population Without 


tion Of the Sample mean, 


n)=2 
ayson! 


the Population without 


(3, 4), (3, 5), (4,5) 
ion mean (11) = 
Population 


il) 


Calculation of sample distributi 


Sample No. 
1 


N 
pL Ww 
Om rAHYN 

j) 


= = =3=uU ion mean i.e., 
= 70 =10 to the populati 
Mean of sample means (X) nes of means is equal to 
ing distri 
f sampling 
Hence, mean o 


Xx E(X)=p soit nee cat ae 
: = 7 ing distribution mple mean 
de a of the sampling 

i tandar 

il) Calculation of s 


is 
mean 
Now, standard error of 


SB, (X) = War (X) = 


universe. (i) select all sample of a. 
f a mple means with the mean of the univ 
Typ 


other S# n of the sample mean. 


ation without 
n from the poP ul Teplacemen, i 
Populati 


i) Possible number of sam NC, = 5¢,= 10 (5, 6), (5,7), (6,7) 


’ (4, 5), (4, 6), (4, 1; 


Possible samples are: (3, 4) 


ii) Calculation of sample means 
Sample number 35 


Sowmranunewn — 


iii) Calculation of mean of universe and mean o the sample means. 


Mean of universe = Population mean = = jeasseeit =. 
Mean of the sample mean = yoes 5 
C, 10 


Hence, is the sample means is equal to population mean i.e, XY — Wie. E(X)=p 
Calculation of standard error of the sampling distribution of sample te 
Sample number Y 


iv) 


Zzaoniom sample of size 36 fr 

ra accu ©M a fin; 

(ined deviation iS 12.6, find the Standard iis Population Consist 
A ent (ii) without replacement, "Of sample Mean ei 


ASS, 
ii) ting 10] u 
yace™ wh 'S. If the 
. ie size, n = 36; population s on the samle is drawn rset 
ion" sample si2°> ze, N= 10] () with 
0 = 
| population s.d., 0= 12.6 
is drawn with replacer 
‘ig ede gample is Placement, then the Standard error of 
S.E.(X) = 2 12.6 °r sample mean is 


fn a: [36 = 2.1 
js drawn without repl 
) lethe sample 1s Placement, then the standard error of sampl 
€ mean is 


A simple random sampling of sj : 
ample 6.4 on & of size 9 is drawn with ; 
' tion insisting Of 25 wits. If the wuniker-af Cit te Out replacement from a finite 


; ts i 
standard error of the sample proportion of defective. * im the population be 5, find the 


«lution: Here, 
Sample size, n =9, Population size, N 


Population proportion of defective units, P = 2 =+ , Q=1-p =? 


Ifthe sample is drawn without replacement, the S.E. of the sample proportion of defectives is 


1 4 
P. N-—n 5*%5 [25-9 _ 
S.E(P) =" Fe. os oe is “ je = = 0.1089 
Eample 6.5| A random sample of 500 oranges was taken from a large consignment and it was observed 
that 65 were found to be bad. Find the standard error of bad oranges. 


Wntion: Here, Sample size, = 500, Bad sample = 6° 


65 _—1—-np=0.87 
Sample proportion, p = 599 = 9-135 4= ae for large population) 's 
So: Hewtandacd eeret sample proportion of bad oranges (for larg 


[P PG (s—P for large samples) 
0.13 x 0.87 _ 9.015 
= 500 


Reng Stone from the 
A population consists of five numbers 1, two which can ee 
(i) Enumerate all possible samples aaa 


__ Without replacement. opulation. sample mean 
tt) Calculate te mean and var inne te MeeiributioD of the 
lii) § f the sa mean 
how that the mean 0 pution ° sample 
Population mean. 1 


(iv) Calculate the variance of the 
Y) Standard error of mean 


Solution: H 


iil. 


ee Nat.n* 


ible § 
ber of possib 4 Vee 
scat "GeO e 3), (1, 5) (ls (1,9) @ es 3, Dr 3,9); 8)6, 9} (7,9 
’ -(1,9)) ; ‘ance: 
Thus the possible at : weet) and pop lation varia 


Calculation of popu 


ee BY eo 
Population mean = Y="7 ="5 ~ 5 
x“y-Yy 40 
Population variance = ry a= 8 


Calculation of sample means and variance of the sampling distribution of means: 


Mean of 
the sample Means ()) = Xv 50 


re 


Since the mean of 10 ~j9=5 
Of the 
conclud Sample = 
€ that the mean of Ple means G)=5 7 
» “dual to the population mean Y = 5” 


Population Mean the sampling di 
Variance of the 1Stribution of th : ] to 
Sample means ; € sample means is equé 
is 


oe 
O)=7 UG Fp _ 30 


The Stan = 
S ire “TOF Of sample m = 
eS. = Can j : 
OV «7 | 'S Biven by 


| Precision of an Estimated 


je Enumerate all possible samples Of size 2 
ple = : . ‘ stZe 
by simple random sampling with tb taken from t ‘ 
4a ation of sample mean. p aCement and Pulation Whose I 
gjsiribv Scenes Mean and vari lements 6, 4, 3 
jptio® Population size V = 5, Sample size n=2 Ariance of sampling 
0 ber of sig , 
possible number o Paw 2 which can be drawn f, 
NY = 5° = 95 ways, Tom the Population with rep] 
possible samples are Sncoinet 
(6,6) G4 63) gg 
(4, 6) (4, 4) (4,3) . i (6,5) 
(3, 6) (3, 4) By G (4,5) 
bane eae W3) ny Gs 
(5, 6) (5, 4) (5, 3) ne ss ; 


Mean and variance are calculated by same process 


(13 Mixed Sampling 


if the samples are selected partly according to some laws of chance and partly according to a fixed 
anpling rule (i.e. no assignment of probabilities) they are termed as fixed samples and the technique of 
glecting such samples is known as mixed sampling. In survey we can used probability and non 
nobability both can be applied. 


The precision off an estimate is defined as the reciprocal of its variance. For example in case of 
simple random sampling without replacement (SRSWOR) 


- 2 Nd Ss 
yyy = BBS = (1-)n  OD 


1 1 
Hence, Precision Vary) oe 


ie : : ampling 
Th ere when increasing the samp 
e precision can be increased (i) when sample size 1s increase. (ii) wh 


. ‘+c in the population. 
ion (G3 . ‘ahili he sampling units in t 
Tection 3 (iii), when reducing S”, the variability of the 


nder Simple and 


. Ratio and Regression Method of Estimation u 

Stratified a “ 
eeeenriatr Ratio and Regression Estimapo’ 
a character under study, 


83 
‘Untroduction of Auxiliary Informat 


i ie trong theory if the auxiliary information, ale isa 
Min Pulation units, then it may be advantage ‘on may VE exploite 
“tina & The knowledge of auxiliary informal’ makes use © 

= isla developed inane 3 7 Hie paich estimators: ay 
‘Wilig ' Tegression estimator are the examples ° the aux! 
Me ite formation is available on all the SP 


ling units: asl more 
the © 
“n be obtained easily without much burden oP 


om Sampling 
rithmetic means of Obsen, 
- tion which makes yse ot 
other met iable under study such meth e 

0q 


$ is ere ith the van! 
The theory of SR @ value: rhghly correlated Wr oulation values than those ba, 
ich 1 


js averag' pul ed 
, sample that 1s "© which Is wate of the po On 
value in sexe an auxiliary variable reliable © - 
information accurate) and give od of estimation 
are ore precise (20° thods are: -) Regression meth 
simple average. ee imation ai) 
es 


(i) Ratio method of . iti 
ig obtained for each unit in the sample 


lue of y; at some previous time When 4 
sed precision by taking Advantage 


vi 


The population ¥ of x , 
x, Yaties 


complete census was taken high correlation between x; and y; then ratio 


exists 
of the correlation between Yi and x;. if there 
_ ' . | _ 
little from unit to = sample of size n is drawn from the population. Let y and x be the sample 
Let us assume — are 2 3 vay n). y and x be the sample total of y; and xX; Tespectively, Also le 
ies eh : oaied x be the population total of X; then the ratio estimate of population tot 
X be the ation me ‘X; an 


Yof Y,is Yp= tyes “x where, y and x are the sample totals of y; and x; respectively. 
i x Xj 
If the quantity is estimated to be Y, the population mean value of y;,, the ratio estimate is Y, = x 


Similarly, in case of stratified random sampling Yp, = Z = X; where y; and x; are the sample total in 
‘ 


i 


the i" stratum and_X; is the Stratum total, 

Notation and Terminology 
x;= Auxiliary variable, ¥ = Variable under study 
a Sample total of x,, ¥ = Sample total of y, 
x= Sample mean of x, ¥ =Sample mean of yi 


X|= Auxiliary population variable 


X= Population: > 
|} | Spulation total of x; X Population mean of Y, 


A 
Ye = Ratio estir f u 
R 10 estimate of Population total = _ X= z X 
; x 
Yp= Ratio est; 
; ms 
Sstimate of population Mean =* ¢ _Y x 
e x 
BR) = Bias of Tatio estimate 
0 estimates of po 
Pulation to 
Paty Fou we = ¥, the Population Mean Y and population ratio ® 7 
x URES X and £ m4 - 


Je rando ei 
ind simp m sample of size a 
n ( large a wa 


yw? f) aca ae 
(Yr) = MODEL Oi ay —— Samples 
= ~~ 2Urvey yy 
(lef ’ VP) (1~ Fy yy ~ en _ a9 
Vik) =D 2h ( y Bp 
me sar Ni 
» Where 
pt We DON VR) = ER) — Ex Ayp "°4= ithe samp 
ifn is large E(R) =R HR) Pling fraction 
(R) = ty 
)= BIR y= [EET 
Define a new variate ‘i X | =PEV-Rz/) 
i =yi— Rx; th mW ~ Re) 
u =y - . “) 
and a 
U= ¥-r¢¥-7-2 
=y_= > s 
q*=Y=0 
Hence 
so Nat ote pes 
x a w= Ly (j= Rey 
oni p 1 nX? Sis) N-| 
= X= RX 
Hence i 
VER) VW(RX) = WR NX) =X yd =" pity wetel (y,— Ruy 
Ape Pr ==X= RX 
Hi - = - 
sai, V(Px) = WRX) = yk NX)=NX phone ite Oe 
_wpicfyy (i= Ruy? 
=N Liat N-1 
Also, g ok K-RX 
as 
2 fu (y,- Ra 
1-f on okt 1 s V1 
Hence, v2) _vhK=* yh) =X aX Let N-1 ae, 
s «an ratio R are 
Theorem ¢ mean Y and population “i 
62 The ratio estimates of a total Y, the population 
Zz one 
- Pr = Ly, Pee =X 
ling 
and = - anle ndom samP 
R == respectively ® for sim? 
nd) = ite [a+ 2" 


ees ve 


M%( fn) = 


21 C4G- 2C» 
Qe ee 
Rx 
Proof: We have yk) = 133). ee ae Nes 
ghee 1 Lae “aki 1) 
=yrN-D 
[5% (1-1) 2R Oe DOD * Fr 3H 
_tef[ Be (i- yy eed pane = (i= ¥) Oia D| 
za 


5-2 i- z] : 


flyers weet 
WEpe1 OX) i - Y) 


Bait ¥P [Zi -1 i- 7) Lorn FF 


2 BY -My-P) 
= PE GRP [E Ov FP 
= eVW— DS? VV—S? = piv 1)S,S, 
Substitute value in (i) mh) =42f = # |s +RS poW=) 
y ae N-~1 5,5, | 


Cov(x. 
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_ tof os 
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_-P 
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(4Ratio Estimate Under Stratified Sampling 


Let yn; be the variable under study and x,, be the auxiliary variable. Let the population be stratified 
ino L strata as follows: 


Strata Value of Y, in the population 
| Yui Y2 st Yin, 
I Yo Yo2 oe Yon, 
L Yu Y2 as Yin, 


Where N, + Np + .N3 +... + Nz = N (size of population) 


Let the sample be drawn from given population from all ste. 


Strata Value of y,; in sample 

Yu Vi2 ae Yin 
: yaoi yn ae ‘Yum 
: Yu yi2 “ Yin 


Where Ni +p +1340 +=” in the sample are 
Let the corresponding value of auxiliary variable %n 


Strata Value of x»; in sample 

X11 X12 oa ” 

: 2 X21 X22 _ a ” 

: Xi xXL2 ns si e popu! tion tol mo 
rng tat ate g ratio estima? . estimate 


_ Situation there are two ways of ob! (ii) 
i 


Separate ratio estimate 


n from each strata in the Populatiy, > 
a” 


le is draw’ 


dom 
imple ran 
dependent sl 
Theorem 6.3 If an in trata then. oF a 
sample size are aoe 5, 5! nH [sn + Ri 52 — 2Rigis YSxi) 
Yrs 21 nj Nj 
| ni pt [vig Zin Yar Xm B= rhe 
fi =H? Ri= x? 


| i i iven b 
| tio estimate of population total iS g y 
ce of the ra 


Ma-p [S + RS, - 2pRS:Sy] 
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We know the varian 


A 
( F,) = . | 
i i tratum 1S 
e, the variance of ratio estimate 1n the: Ss 


Hence, 
H( y ) 5 


Proof: 


[Sy + Rar — 2P:RSsi5yi) 


Where 83 = Eins Ga He) For ,), P= Sypx/SSy 
4 yi ; ae = 
A Ls A 
Since, Yrs = Die 1 YR, 


(Fas) = Die Vn) = Diet — fom, ul iS +E & —20,K58] 


This is valid only if the sample in each stratum is large enough. 


Theorem 6.4 If stratified random sample is drawn from the population of size n which is large then 


1 | Ni -f) 
Hel Vac) = Zi. [S,, + R°S;, — 2p;RS), Se] 
A 
Proof: We have, Yee =x and Y= RX 
Hence, Pao~ Y = tity _ ee ae > NX _ es 
Xs a X., [You-RX,] = = [¥—RX] 

Define, n =Yn,— — RX. - 
Then > a ae Vn, 1 N- Ry <1 %,,/N 
= =Y-R¥=0 
Also, - ~ 

Sin = V=1 Ym!) REE Xn,/n; 
> Ui = Yi; — Ry; - 
> bn Nz L 

ts] iy = NW; = i 
ai ; » | Ji LS is 
ial Y,,~ eo a 
Now, 
VVac) = = E( Pac — y= ao. slneas 
ra MX ip — wy | Bhs 
st 


UT) = NOs) 


d ~ fi) 
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nj u 
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; oe ee | 
Now u; Nee l is] (Uy, — ir.) co (ii) 


Seed [(Q,,-a U,) ~ ~ RG, -¥,)P 


ee | yt 
NT 24 On-Jn)? — _ry 

i ) On, Xn) — 2R0n,— Yn) (n,-%,)] 
= S) + B'S’; —2Rp, S,,S, 
Substitute value in equation (ii) 


M(1 
VPnc) = Zi. she wes 280.5, 55 


n 


6.5 Regression Estimator Under Stratified Random Sampling 


Like a ratio estimate, the linear regression estimate is designed to increase precision by the use of an 
axiliary variate x; that is correlated with y;. When the relation between y; and x; .-- examined, it Leg 
lound that although the relation is approximately linear, the line does not go baie ae ? 
‘Uggests an estimate based on the linear regression of y; on x; rather than on the ratio of two 


th ulation mean x 
We suppose that y; and x; are each obtained for every unit in the = = et ne : 
'Sknown, The linear regression estimate of Y, the population mean OF Yi 1° Yr 


an of x/s. The 
X be the population set 
Where y ean of y; and Xi respectively. sed by unity. 
st y and x are sample m Vi vemate of the cha 


ge in y when xis ineree 
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“it Lr denotes linear regression and b is an ¢s 
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in i e e a ss * pilties 
Probability Proportion to Siz e Samp oat es ; 


he rding 
laign Ue Units of sample vary in size epee re 
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0 their size, 


SS F 
a 
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rge areas be selectey “® 
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“1 pe having 
aj are will ‘ ; 
seal e geourapt that the village with la 
For example, ron it iS etter | : 
meant for estimating the crop produc der PPS sampling large 
mepatility proportion © IT mall size 
the sample as compared to the unl eer ion pt oba 
| In the case of sim ty noe es, th simple rand 
iderably 10 > : é 
ae at aes will not give the He aes 
such SI 9 : an auxi i 
aide Oe f the population parameters. 
of different units in the population dep 
r y of interest are avail ; 
able for al 


more efficient ¢ 
lities of selectio 
ly related to main character y ) 

llage with larger geographical area are likely to have lary 
ger 


pilities are equal on all the units 
om sampling is not appropriate i ¢ 
e of large units in the populatj Ps 
rmation can be utilized in a 0. [hy 
One simple method is to : 
ending on their sizes. 


| population. 

| because in Cling 

| order to co! Sonate ass; 8 

sample to get stima Ign 

| unequal probabi 

Some auxiliary chara 

the units of the population. 

population and larger area 

villagers are selected with proba 
When the units vary in their size 

units, the probability proportional to the size 
The most frequently used two techniques of selectin 


6.6.1 Cumulative total method (Cumulative method) 
The sample of size n from a large population of size N with probability proportional to their si 
U sizes, 


cteristics x close 
For example, the V1 ae an 
oO rovide S ing . 

p pling scheme in wihigh 


under food crops. It may be desired ( | 
bility proportional to their population or to their geographical areas. 

s and variates under study is highly correlated to their sizes of th 

Q 


(PPS) sampling can be used to have an efficient estimator 
g a sample are discussed below: , 


proceed as follows: 


he size to? unit be Xj (b= l 2 oN the total { this $1Ze be A= Ze 
1 ( 9“) , ) 0 z ws X 
Let t | “hie 


The number | to x; 1 ssociated th t f 
1S a. Wi | h 
i] 1 1 he Irst unit and x; + | tO X; + X> wit the 0 


on. A random numb i 
er R is 
then the item is selected soins boi a random number table. If x, + x 
associated with this random number as a ‘ : Xp te +X Sp ty tee th 
as a sample unit. : 


6.6.2 Lahiri's Method 


Main drawback of the cumulatiy 


consuming as well € method j 
as costly. j ; Is that ¢ : 
tly, if populati at construction of range for the items 1s time 


On consists of 
of large n 
umber of unj 
units, 


Alternative to thi 
this method: . 
construct : od: Lahiri 
et cumulative total iri (1951) developed a simple h 
method which does not require 
, quire 
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study Varia 
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al to sj unit in the ; 1 of th 
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P bability associated with i ual 
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i - ndently and identically distributed. ¢ nae Probability ¢ Value of y, for eae 
" | be in epe y d uted, Consider F Y for the ” obs ech 
‘i ppulation total Y based on the sample of size 't' usin : Soule Of estim ti *€rVation then UL 
Sing Ppy uma ‘ 
the - ‘nate Y, we define z,=2L , BPS with Teplaceme 1 # Population mes 7 
Nod ow fo lie : ‘Np, *S 89 estimator of a a 
ia Population F. Then, 
ing ea z; aes 
Ign zat Np 
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all were 080 take any of N unit in the population with probabilit ; 
eer ¥ , 5 ¥ Proportional to the size Pi. 
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nd so f ‘ yy 2 
2 =) _ i > = 
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time z A 2 MW) _ Bean) 
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g the estimate 27" 7 
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. ent, 
selected with pp SY fo =¥ ) 
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Theorem 6.6 
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pling Distribution 
6.7.1 Chi Squared Distribution | 
The °-distribution was 


19 initially dj 
20 by Karl Pearsons’ (orien ie by Helment in 1875 and was defined independent if 


apn of a standard normal Variate j i Spsenlatioy, ) 
then 7-2-1. € 18 know : oH 
ona" is NO, 1) and 2 -( ) "as chi-squared variate with 1 d.f. If X~Mt 


o ) iSachi- ; 
In 8eneral, if X; he l ) Squared Variate with l df. 
2, “1, then PO» : |, 


1, aren independent hs 
Normal vari ‘ance 0" 
ates with mean pi and varian 


x= % (Soa 


is a Chi-s 
quare Vari 
ate With 
ndf. 
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[no 
X= 
Z=——"~N(O,1) then, 2 = (x) 


and when X;~ N(ui o”) then? = & (Huy e 

i=] 0; be 

6.7.2 Some Properties of x’-distribution 

1, Mean and variance of chi-square distribution with n 
degree of freedom i 

Snand 2n re 


9, Mode is 2 — 2 forn > 2. exicclively: 


3, Karl Pearson coefficient of Skewness is \/2 
n 


8 2 

_ 8, =_and yi = 2 NE 
4 Bi n v1 n S. B)=3+~ andy = 
6, Itdoes not have parameters. n 


4, Sum of two chi square variate is chi square variate ie., if X; ~X> and X ~X>,. Then, Xi + %2~%, 
me ae. 2~ An+m 


8, Moment generating function of x is (1 —2ny”” 
» as = a square distribution gives the positive half of normal distribution curve 
10. Ifn=2, it gives exponential distribution with mean 2. 


2 
11. Asn—, the X,, tends to normal distribution with mean n and variance 27. 


2 : 
12. Ifn>30,\2X follows approximately normal distribution with mean {(Q2n — 1) and variance 1. 


6.7.3 Application of X, Distribution 


X, distribution is used in testing of hypothesis as follows. 
1. To test the significance of sample variance. 


2. To test goodness of fit. 
3. To test of independence of attributes. 
4. To test the independence of estimates of population vari 


5. To find the distribution of sample variance. 


6.7.4 Fisher's z-Distribution (For large Sample) 
s regarded as large n 


In most of the cases a sampling distribution i 
enables us to decide (i) the difference between the observed sample stati 
Value (assumed) (ii) the difference between two independent sample statis 
due to change or fluctuation of sampling, the sampling distribution of a samp 


Normal distribution. 


6.7.5 Assumption of z-distribution 
(i) n>30 large sampling distribution. 


(ii) The sample have been drawn fro 
sample statistic (¢) in nearly normal. 


(it) Population variance (o”) is known an 


ance, correlation coefficient etc. 


> 30. The test of significance 
stics and population parameter 
tics, is significant or might be 
le static (t) conform to 


and the random sampling distribution of a 


m normal population 


d sampling units are independent. 
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i= jstr je mean distribu ait cegeuede 
ofasingle P ibution of in we er 
ouble nn : different betwee 
; babill 
ee spution) 
incon jstribu | 
t : tribution (i-€-5 ” wn from 4 normal population hav ing tee 
6.7.7 Student's f Dis cous can 
ele os : = 
Let x1, X2- %3 ia Xn 
d standard deviation 6- i is 
. X—p _ eB Vn. 
‘ i i Id be a SNV and can be psx: 
ance unit 1. It wou i 
ted normally with mean and vari 


would be distribu 


the testing the significance of sample mean. 


6.7.8 Fisher's ¢ Distribution . | - 
It is the ratio of a standard normal variate to the square root of an independent chi-square vana: 
2 


N(0, 1) and Y ~ X,, such that X and Y are independent ther 


divided by its degrees of freedom. If X ~ 


Fisher's f is given by 


es 
ae 
n 


Its probability density function is given by 


eae Daman SS 
fO if (43) (RF < 


where, v = n is the degrees of freedom 


follows student's /-distribution with n degree of freedom. 


6.7.9 Some Properties of t- 


distribution 


Odd ordered raw moments abou 


Odd ordered central moments 


3 t origin is zero, ; 

: FO. 1€., Bore = 03 7=0, 1, 2, «. 
~ Mean is zero, | 

4 


are zero. j 
© sare =03r=0, 1, 2,. 


baad) 


Even ordered Moment is given } 
y 


Hor = Wy, = gf “P= 1) 2r— 3). 3 1 
5, ("~2) (nay, 


Vari eee ; 
nance is given by by nt “(n- 2r) 


(@—Din>2 

6. Bi =0 and p, - 3@~2) ) 
8 A (n— 4) 
STE} ee, t distriby 
ASN ~} 0 


7. M. 
Fs : ce) j 
tion tends 4 de is zero, 


ma On . 
e distribution tends Ormal distribution 
: 


istributj 
On wit 
A (lyr r) degrees of freedom. 


1. j-distributi 


unknown. 
2, t-distribution is used to test the Significance 
variance is unknown. 
3, Test the significance of correlation coefficients and te 
4, Test of significance of pair sample. 


of difference of 


gression coefficients, 


Exercise 6.1 
1, Describe types of sampling survey methods. 


2. What noon ao , by simple random sampling? Describe the m 
replacement and without replacement in the sampling. 


3, What do you mean by random sampling? Describe difference by of random sampling 


> 


(d) Cluster Sampling (e) Multistage sampling. 
5, What is pps sampling? Differentiate between srs and pps sampling. 
6. When is pps sampling is more effective than simple random sampling? 
7, Describe the method of selecting pps sampling with replacement? 
8, Find the relation for unbiased estimate of population mean and its variance. 
9. Find the relation for unbiased estimate of population total and its variance. 


Exercise 6.2 


ed of 1 Distribution for Sampling Sample Survey Methods 
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on is used to test the sj ifica 
Signi nce 0 
1 f Sample mean when 
Population Vari i 
nance is 


sa 
™mple mean when population 


ethod of selecting sampling with 


_ Write short notes on (a) simple random sampling (b) Stratified sampling (c) Systematic sampling 


Multiple Choice Questions circle (O) the correct answer. 
1. A sample consists of 
(a) all units of the population 
(c) 5% of the population 
2. Sampling is in evitable in the situations. 
(a) blood test of a person 
(c) testing of life of dry battery cells 
3. In case of systematic sampling 


(a) sample mean is biased esti 
sed estimator 


(b) 50% units of the population 
(d) any fraction of the population 


(b) when the population is infinite 
(d) all the above 


mator population mean 
i ylation mean 

(b) sample mean is unbia pop 

nnot estimate popu 

ion mean 


lation mean 
(c) sample mean ca 


(d) sample mean may equal to populat 


is 
4. Mean of x2-distribution with ” degrees of ee (c) 2n (d) 7 
(a) 1 (b) 0 
edom 1s 
5. Variance of x’-distribution with n degrees of fre a (d) 
0 , 
(a) 1 (b) ent draw in 
; t 
6. Probability of selection varies at each subsequ (b) sampling with re placemen 
(a) sampling without replacement (d) neither (a) or (0) 


(c) both (a) and (b) 


(d) Ways 


red je of size” © = way ' 
Ono oS wae g drawn out of N units 1S: 
“i size n bein 
“ - yf any one sample of size / i _ . a 
8, Probability of a : . 
1/N (oy WN le of size” selected out of N units js: 
| iti mple of 814 
. ‘including 4 specified unit ina samp } : 
9. Probability off including . . i" | . 
(b) | _ 
a) in ; f probability is known as: 
A edure of a sample having no involvement i oe ante | 
10. A selection proceéu es ling(c) subse © above 
nt samp 
b) judgme a 


ues of a sample always poss 


(a) purposive sampling ( 
(c) avalue equal to one (d) all the ibe 


An estimate based on a fixed set of val 
wales 
(a) asingle value (b) any valu 


12. Students-/ is categorized as: 
(b) an estimator 


(c) a statistic (d) none of aboys 


an estimate ae ’ _ 
(a) chance of being included in the sample, it is known xi 


If each and every unit of a population has equal 


13, 
14. The most important factor in determining the size of a sample is: 
(a) the availability of resources (b) purpose of the survey 
(c) heterogeneity of population (d) none of the above 
15. lfm units are selected in a sample from N population units, the sampling fraction is given as: 
x i n 
@) | (b) ¥ (c) = (d) = 
16. Mean of the simple random sample has the variance. 
; x °° ; 
)o = 
17, The variance of the x?-distribution with n degree of freedom 
(a) "i (b) l4n 
(c) 2n 
18. The mean of the /-distribution is (d) 2yn 
(a) always zero (b) 
always less then 
19. The limits of the F-distribution is + i) uncertain (d) none 
(a) tO co 
20 Th wie (b) 0 to co (c) 
* “ne sum of square of t-variates is a 2 to 0 (d) none 
(a) 2-Varite 


(c) x”-variate (d) none 


A jntroduction 

e design of experiment is the planning the experiment in such 

be collected in systematic way for the problem under study 5 ried that relevant information 
hence it 18 a plan, structure and strategies for decision ing it Hi efficient inference can be 
es. It is a Way of getting an answer to the question which is in rs s based on the significance of 
j) absolute and (b) comparative study. € experimental unit of the problem 


should 

wn 
yariabl 
under ( 
71.1 Terminology 


Treatment: The specific different procedures under comparison in an experiment are different 


treatment e.g., In an apr icultural experiment the different varieties of crops or different measurement will 


be treatment. 
Experimental Unit: The place where different treatments are used is called experimental unit. 


Experimental Error: The error which arises at the time of experiment and cannot be controlled by 
human is called experimental error. 


1.2 Principles of Design Experiment 


The principles of design of experiment are: 
i. Replication iu. Randomization 


It means repetition of treatment if a treatment is allocated to 
n different types of layout in random 


iii. Local control (Blocking) 
Replication: '? experimental units. . 
Randomization: 
manner. 

Local control: It mean 
minimization of error. 


7.3 Concept of Analysis of Varia 


__ The analysis of variance ( ANOVA) is a powe 
differences among the parameters of several groups. 
If we have to test the significance difference between more than 

then ANOVA is used. In other words, ANOVA is 4 statistical tec 


It means the experiment unit are arrange i 


s the experimental units are blocking to control the variance and 


nce (ANOVA) 


rful statistical tool for tests of significance to evaluate 


three means and t-test is not useful 


hniques specially designed to test 
Whether the means of more than two quantitative populations are equal. It aes Se ones 
inference about whether all the samples are from the same normal eer Ke ; 
ANOVA was developed by Ronald Fisher in 1918 for the design of agricu 5 fener " 
j tities can under ce , 
The total variation present in a set of observable quan Se apcaton ot cage 


Pattitioned into a number of components related with oF ps cae. 
organized process for achieving the variation is called the ana ys st moe ate 

It is dominant statistical measure used for the test of are ae adequate go that Nex 
upon F distribution is sufficient to test the significa” : test the significant difference. 
Necessary. In such conditions, ANOVA technique is taken '0 


gnificance based 
t test is 


‘ertilizer Nitrogen, Phosphorous or Potash 
h fer’ “ont because three groups are 
ieties of fertilizer. 
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= 7 ar js cone 
a researcher test an f crop from several var ‘cal d 
mical data. 
f agrono The Vatiatig 


For example, 


the vield of a POP: sn 
useful for the vie sompare ja analysis of 2840 
Hence, ANOVA is adopted {0 © ‘cher tO eal the problem f merical data is due to two causes Name| 
I was introdueet ‘ a a present in any S¢ y 
: ; The total v2 

is inherent 1n nature. auses. : " and in fact, it w 
(i) Chance causes and (it) Assignable t man to use the term Ne eed in practical “ ‘ i 
Professor R.A. Fisher was the a ning OVA, explaining ee f this techni Ne Late 
developed a very elaborate theory concert ereite d to the development oe que. ANOV A is 
and many others CO different groups of data for homogeneity, " The 


on Professor Snedecor 
essentially a procedure 
essence of ANOVA is that th 
between samples and also wi 
purposes. Hence, it is a method of 
components corresponding to variou 
whether various varieties of seeds or fertilizers 


fference among 


" ‘ 
hich can be attributed to chances’. There may be variation, 


s. ANOVA consists in splitting the variance for analytica 
to which response 1S subject into its varioys 
s sources of variation. Through this technique one can explain 
or soils differ significantly so that a policy decision could 
be taken accordingly, concerning a particular variety in the context of agriculture researches. Similarly, 


the differences in various types of drugs manufactured for curing a specific disease may be studied and 
judged to be significant or not through the application of ANOVA techniques. Likewise, a manager ofa 


big concern in order to know whether their performances differ significantly. 


for testing the di 
e total amount W 


thin sample item 
analyzing the variance 


ANOVA in Research Methodology 


Thus, through AN : ty | 
hypothesized or said metic ie acorn ae te number or factors which are 
amongst various categories within values. If we take sibs i ee as well investigate the differences 
lues — aes investigate the differences 
site tue Shea € said to use one-way ANOVA ‘ni 
relation between two ie way ANOVA. In a two or more way 
a dependent variable ¢ pendent variables/factors), if any between 

an as well be studied for hatherdaelelont. 


7.4 F-Statistics and F Distribution 
7.4.1 F-Statistics 


ANOVA, the interaction ( i.e. inter- 
two independent variables affecting 


S €pend 
X (m and V €nt chi-g 
~X ny then F ctat: qared vari 
Stat : : ates div e 
ptt i IStic ig defined as ided by their respective deg 
Which follow - 
Here, chj distribution with ( 
> 1 3 A 
defined as the Priel Variates are a} degr ee of free d 
10 of two independent SUM Of the squ °m. Hence F — Fi 
estimates ares of sta ini 
ry) Ndard ‘ . ae, te gl$0 
sh ‘ as s / a . Population ee variates. F statistic '5? 
mse, Sly _ 1>S; Ces. It is given by 
Variances Oa ks ~] (X,; - xX), é 
D respect; 2 SS 
2 “SSPeCctive Ny aa 
- or 2, ~ 4 tio" 


ar ‘ 
© unbiased estimate of popu! 


= SUC Ue CORY 


mf 
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(i ? 2, 3, “+, M)) an aC 2 n 
, _— } zy Gj L, > 3, aa 2) are sam 


ples of sizes m; and n2 respectively drawn from th ple values of two independent 


1 t 


Be 
fer DX, and X) = 43 2X, are two sample mean 
S of two samples of sizes n, and 
\ ny 


aspentivelY- 
Also s =k DXi; — Xi) and S =—— Yx, - % | 
ny — | “A427 — 42)" are two unbiased estimates of the 


ation variance o° from two samples. 


S| “1 
Then, F=3 4 me, OO mw ~ (my VE 
2 (m—S2 ~ (e-D" © (m= ie 
2 


(ny = 1) 
=> = > 
(nz — 1) = x a han 


popul 


The ratio of two independent chi squares variates with 1; — 1 and > — | degree of freedom are 


distributed as Beta distribution of second kind with parameters a and as 1 


(41) 1 i hess 
Hence, Gua) F is also distributed as Beta distribution of second kind with parameters mt and 


m-1 (-)). -1 m-1 
7: The, —1) 84 Bo ane ) variate, 
7.5 Linear Model 


ontrol the error and to minimize the 


titative observations to ¢ 
assignable 


Suppose yj, ¥2, “> Yn are 7 quan 
that the observed value is composed of two parts, 


variation. In all the case it is assumed 
causes and chance causes. 


Letthe model be 9 Xj =P + & 
where 1; is true value due to assignable causes and e; is error due to chance causes. The true value of 
li:is again assumed to be a linear function of k unknown quantities T1, T2, "> T called effects. 
i= Bat Batrat- + Bi Te where i are known usually 0 or 1, j=l, 2:3. k. 
This set up used in the analysis of variance is called linear model. | 
or class (i.e. the treatment group), ‘I taking the values 
bers of the class, ‘j' taking the values '] to n' ( hence ‘a 
the observations *ij are assumed to be normally 
be written as 


If the subscript 'i' is used to denote the group 
he mem 


| to a’ whereas the subscript ‘J’ designates t . 
soups and 'n' replicates per group). Within class Be ? 
distributed about a mean # with variance 52. This linear moee can 


xj=ptate; (1) the overall mean of the observations (u) 


wn from 4 normally distributed 
n between replication 
m this simple 


i. 
Jement 'e’ dra 


of natural variatio ‘ 
derived tro 


e sum of three parts: 
random € 


“combined effects 
. aes types of ANOVA can be 


to equation (1)- 


Hence, an observed value ij is th 
ii. a treatment or class deviation 'g' and 111. @ 
Population, The random element reflects 
and errors of measurement. All the more com 


Model by the addition of one or more further terms 


A described 


e Types of ANOV 
Additional 
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Interactio 
n term, 


| One-way, fixed 


One-way, random ai + 


Two-way, randomised blocks 

Three-way 

Two-factor, factorial pane re ne 
i it-plot 

oe eden ed to be the sum of the constant elements, the 


[in each case, the dependent variable (¥) can be ney: n terms in each row. 
additional components and, where appropriate, the interactio 


at byt Cat dy 


at 5 (ab); 


Assumptions of ANOVA i 
i, The population for each sample must be normally distri 
ii, All sample observations must be randomly selected and independent. 


bute with same mean and variance, 


iii, Various treatments and environmental effect are additive in nature. 
iv. Population variance must be equal. 

v. Population variance is linear and independent. 

vi. All the observations should be homogeneous as far as possible. 


Application of ANOVA 
i. To test the significance of homogeneity between three or more than three groups of data. 
ii. To test the significance of linearity of the fitted regression line and over all significance of 
regression model. 
iii. To test the significance of observed sample correlation tin. 
iv. To test the significance of observed multiple correlation coefficient. 
v. To test the significance of equality of two variances. 


7.6 Types of ANOVA 
There are three types of ANOVA 
i) One way ANOVA . 
In detail, ANOVA ii) Two way ANOVA iii). Multi ANOVA 
, Al can be classified = 1 ulti way 


1. One-way ANOVA, ‘random effects’ mode] 
vi. factorial ANOV A, repeated ais factorial ANOV A 


In each case, the 

e > the type of experimental desion 
selon ae limitations of the ata "ANOVA et a Statistical model is given and 
€ Statistical mode] discussed iti f 0" 

ee i types of ANOVA described oe mnination of the puniber = ae ahaa oles sone 
of those that eae to illustrate the meth, replications are cons! ot be 

i ed in mo: Odology and the results quoted may ° 

Periments, 


li. two-way ANOVA in randomized bloc 
V. factorial ANOVA, split-plot desig" ” 


Te extensive ex 


Pi Way Classification or One-Factor Fx Design of Experiment au: 


e Periments of ANOVA 


A e to test significance of data with sj 
if we hav mui with single facto 
s ss of variance. If follows duplication and deviation dihaee Wise or column wise) is 
| if we test the significance of observed variable row ind pee one way 
| the following description is u nd column wij 
gee eaten’ seful. Ise for three and oe 
t: 
psy 8 a 
Consider n observations classified into k classes of sizes M1, M2, 13, - 
sim Let xy be the j® observation in the i class. i= 1,23, ... & a "> Mk Tespectively such that n = 
A »4,9,-, Rand j= 
seats is as shown below; J=1, 2,3, ..,n;. The deinen 
e 
The total variation in x, observations can be split into 
a. Variation between the classes commonly known as treatments. 
b. Variation within classes commonly called the inherent variation of the random variable. 
The first type of variation is due to assignable causes and second type of variation is due to chance causes. 
The main objective of the ANOVA is to examine if there is significant difference between the class means. 
Mathematical Model 
yf tga pe beys et Ga— W ter BF OETA LA A BIS AS 
Where, x= j® observation in the i class 
i= ive means of xj 
pi; = Respective ij 7 e Sones . : 
u = General mean effect (grand mean) and is given by w= 2j-1 _ 20> a1 
0, = Wy — = Effect due to #* class (treatment effect) 
{= 
e, = Error due to chance. 
Assumptions ; ‘nenaided 
i Jation are indepen 
i) All the observations xy, which are used in above rela 
i ‘ve in nature. : 
| ii) Different effects calle eee from the population having constant variance. 
iii) All the required observations are drawn 
_ Hypothesis of Interest « of all k classes are equal) 
| Ho 2 fy = fp = fs = = He (Population mea” | 
| I d to a oy = 05 = OH=O ne of the k class is different) for i= 1, 
nee 0 
| saree | ae eth Jable a random sample from each 


ion mé . 
pi. is different. (Populats that we have aval 


theses requires 


ww Sy’ 


| Hy; At least one 
2,3, , k A test of these hypo 


Population or treatment. 


ewer the followings. What type of ANoy 

ble and answe observations were analyzed» 
ta How man : ; 
a ve different effect? 


' amples ha 
| | : mployed? How many sates om Mean square | OF 
0 
: 0.05 ‘el of significance, can 
| 
Se eee 


Between sa 
Within samples oa 


Solution: Here, 130 = 100 + SSE 


or, ; 
or SSE = 130- 100 = 30 : 
) df, for TSS = af. for 555 + af, for SS 
10 =df, for SSS + 8 ciate 
- af for SSS =10-8=2, MSS= 100/2= 50, MSE = 30/8 = 3. 
or, i = . 


= = 13.33 
F, = MSS/MSE = 50/3.37 a 
O ANOVA has been applied, because the classification is according to samples only. 
ne way ‘ 


At 0.05 level of significance Foose, g)= 4.46, 
Here, Fs = 13.33> Foosa,7n = 4.12, hence one can conclude that the samples have different effect. 


Ived Examples — 


Three training methods were compared to see if they led to greater average productivity 
after training. Below are productivity measures for individuals trained by each method. 
Ve Method 1: 
Method 2: 


Method 3 


At the 0.05 level of significance, do the 
productivity? 


Solution: Null hypothesis, Hp: Hy — Uy = 
Alternative hypothesis, H, 1 By 


three training methods of lead to a different level of 


Hs Le., the productivity are same for all methods. 
# U3 1.e., the productivi 


ty are not same for all methods. 
Test statistic: Under Ho, F = MSB 
>" Mi 


n=18 
Correction factor = r = {809 
n 18 = 36360.06 = Total sum of 
of squares 
TSS = (45) 5 
4 
0 ta + (S2) + (37) 2 4 
SSB = hn = 37025 — 36360.06 = 664-9 
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= mn + ms ) ae (XQ) _ T 


' ny n 


271 (2888 (250) 
a eS 250)" 46¢ 
7 6 ~~ 369360.06 = 120.77 


SSE = TSS ~ SSB = 664.94 — 120.77 = 544.17 


One way ANOVA table 


SS 


Between samples 120.77 Pie 
MSB =-~.— = 60.385 
Within samples 544.17 34 
(error) MSE = it (a a 


18—1=17 664.94 


Level of significance, @ = 0.05 


Degree of freedom: (2, 15) 
Critical value: Fos for (2, 15) df= 3.68 


Decision: Since calculated value of F is less than tabulated value, the null hypothesis Hp is 
accepted. Hence we conclude that the productivity are same for all methods. 


[Financial sector [6 =| 5 

ee ae ee ee 

Is there any significant difference in the average return due to the sectors? 

Test the hypothesis at 1% level of significance? 

Solution: Null hypothesis, Ho: Hi = He = Hs i.e., there is no significant difference in average retum due to 
the sectors. 
Alternative hypothesis, Hy: wi # H2 = Bs 
due to the sectors. 


i.e., there is significant difference in average retum 


— MSB 
Test statistic: Under Ho, F' = MSE 


eee 

[Faencistecstor oy | @ 1 3 

 Mannfacnring sector = [2 [3 
Ewe 


15+ 10=65 


T = Grand total =40+ 
n=15 

2 
r (6eY _ 281.67 


Correction factor = ae 


Total sum of square Tr 67 = 151.33 
L = 433-—281.6/ = 1-3: 


2 
755 =O + Pte tT a 


SSBR = Sum of square between samples (i.¢-, sectors) 


Fatt 

Level of significance: 0 ~ 0.05 | | 
Degree of freedom: (2, 12) | | 
Critical value: Foos for (2; 12) df= 3.89 scat caine Go cuiecaes 


Fi eate: than 

Decision: Since calculated value of gr te a : se } rie : 
iv i is accep ed. Ho ejec ed Hence we conc ude that ther 
The alternative hypothesis A, 1S € 


i ctors. 
significant difference in average return due to the se 


Example 7.4| i | for a group of insurance 

[Example 7.4] i of claims processed per day po | 

Se ll the same. Use the 0.05 level of significance. [E 
| 


Employee 2 


Employee 3 


Solution: Null hypothesis, Ho: 1; = Ha = [1 = [Wy i.e., the employee's mean claim per day are all same. 


Alternative hypothesis, Hi: [1 # by 
same. 


Test statistics: Under Ab: 


# U4 i.e., the employees mean claim per day are not al 


MSB = Mean sum of Square between samples 


Correction factor oo 245) 
n 19 = 3159.24 


r 9 
Tp = 3245 — 3159.21 = 85.7 


re betw, 
“*n samples (i.¢., employees) 


4 
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ea . Oxy oxy ; 
a Sy Gey oxy 6 
4 n 


_ (58 (S2Y  (65Y (70y 
="4 * 4.* § + @ 3159.21 = 19.46 


SSE =TSS— SSB [: In one way ANOVA, we have TSS = SSB + SSE] 
One way ANOVA Table 
Source of variation | Degree of Sum of M : 
ae ean of square F ratio 
Between samples | 4-—1=3 19.46 


MSB=—._ 


3 
Within samples | 18—3=15 66.33 6.49 
(error) =442~ 147 
teat [is=t= is [579 


Level of significance: a = 0.05 
Degree of freedom: (3, 15) 
Critical value: Fos for (3, 1) df= 3.29. 


Decision: Since calculated value of F is less than tabulated value the null hypothesis Ho is 
accepted. Hence we conclude that the employee's claim per day are all same. 


Example 15 | There are three main brands of a certain powder. A set of its 120 sales is examined and found 
to be allocated among four groups (A, B, C and D) and brands (I, II and Ill) as shown here under. 


Is there any significant difference in brands preference? Answer at 5 percent level. (Take 10 as 
the code value to subtract it from all given values in your working) 
Solution: Null hypothesis, Ho: bi = Ho = U3 i.e., there is no significant difference in brand reference. 


Alternative hypothesis, Hy: bi * #2 # [ly i.e., there is significant difference in Brand Terese: 


Test statistic: Under Ho 


The given data can be coded as, 


~-13-8+21=0 
2 vi 
0 4 


Correlation factor =>, ~ 12 


Within samples 
(error) 


Level of significance, 0. = 0.05 

Degree of freedom: (2, 9). 

Critical value: Fo 95 for (2, 9) df= 4.26. 

Decision: Since calculated value of F is less than tabulated value, the null hypothesis Hp is 
accepted. Hence, we conclude that there is no significant different in brand preference. 


Following table gives the monthly sales (in thousand rupees) of a certain firm in three 
regions by its four salesmen: 


Solution: Null h # 
: ypothesis, Hy: yy, — 4, _ ue to the fo x 
due to salesman, HS Nas Hs = Hy 16.4 there is NO signifi Site naniiesy 
Altemative h Significant difference in average sales 


Yypothesis H,. 
to salesman, Fl by x H2 # Ly Le, there is 5; 
, Sign 


| ificant di ; 
Tiscidekas ifference ales due 
‘| Est Statistic: Under Hy: F = MSB st le 


MSE 


= 438 — 432 = 6 
| SSE = Su . 
| m of square within Samples (i.e 
=> —— “&., Orr 
a | TSS SSB =30-6=24 or) 
| 
i 
| Within samples 
Ipis | (error) 
hree | Level of significance: & = 0.05 


Degree of freedom: (3, 8). 


Critical value: F'o.9s for (3, 8) df= 4.07 
Decision: Since calculated value of F is less than tabulated value, the null hypothesis, Hp is 
ce, we conclude that there is no significant difference in average sales due to 


accepted. Hen 
salesmen. 


tunple 7.7] The following table represents the sales of three salesmen in four different districts. 


sales '000) Sales persons 
Districts ae ew a —_— “ : ee 
pany Kathmandu 14 20 
Lalitpur 12 23 
Bhaktapur 10 20 
Palpa 8 18 , 
‘ t districts. 
ha Test whether there is any significant difference in the sales of ieee in the sales of 
Null hypothesis, Ho: Uy = Pp = Bs = Ba j.e., there is no significant i 
‘- 5 0: = — 3 = org 
ferent districts, ee ecant difference in the sales o 
Alternat; -. there is significant 
3 ative hypothesis, Hy: pi # 2 ~ pg 1.€+ : 
‘Tent districts, 


Test stat: 
st Statistic: Under Ho: F = a 


les Se 


n=12 
£ = 78y = 2640.33 
Correction factor =, ~ 12 


TSS = (14) + (20) tiwt (18) + (12) ear = 2882 — 2640.33 = 241.67 
SSB = Sum of squares between samples (i.e., districts) 


2 yy (4), xy F 
ox? , aay a YF 
1 


2 2 
(OF, GOr  GOr Cr _ 2640.33 = 2681.33 — 2640.33 =4) 


SEE = Sum of square within sample (i.e., error) 
= TSS — SSB = 241.67 — 41 = 200.67 
One way ANOVA Table 


Level of significance: q = 0.05. 
Degree of freedom = (3, 8) 
Critical value: Fos for (3, 8) of = 4.07 
Decision: Since calcul | 
ated value of F j 
accepted. Hence we concl ae ei | | 
, ‘a : ated value, Hy 
© that there is no Significant jee i a a aifere 


B: Analysis of Two Way Classified Data 

By The total variation present in any set 0 

0 way classification taking both ways @n@ 
ree variables. 
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. Null hypothesis, Ho: Wi = 2 = py = : 
different methods. 3 = 4 Le., there is Significant differ 
: , ence in the r 
Alternative hypothesis, Hy: 4 * ty ¥ uj esults of 
different method. Hh # Us Le., there is Significant differ 
ees ence in the results of 


t statistic: Under Hp: F ==" 
— 0: Y= MSE 


MX) 
IMX2) 
IIKX3) 
IV(X4) 
Grand total, T = 13 + 18 +27+36=94 
n=16 


C ee (94)" 
orrection factor = LE. = $52.25 


TSS = (3f 2 r 
S = 3) 4+(4) ++ +(9)7 + (oy -— = 646 — 552.25 = 93.75 


SSB = Sum of square between samples (i.e., methods) 


(2X) EXSY (=%)" 2 
= = Gy CGY Guy TF 


ng ng n 


(3, (18, 27 36) 
aire ita + GO _ 559.95 = 77.25 


SSE = Sum of square within samples (i.e., error) 
= TSS — SSB = 93.75 — 77.25 = 16.5 
One way ANOVA Table 


Source of variation ee es 


Between samples | 4-1=3 _ 77.25 


Within samples 
(error) 


Level of significance: 0 = 0.05 
Degree of freedom: (3, 12) 
Critical value: Foos for (3, 12) 4f 
Decision: Si lue of F is grea’ , 
ecision: Since calculated va W” : is accepted. Hence, we conclude that there 1s 


ter than the tabulated value, the null hypothesis Ho 


is rejected i.e., alternative hypothesis Ho 
he results of different method. 


significant difference in t 


| data is classified according to two ways is called 
inn both sides more than 


f numerica ; 
column wise 


lysis row wise and 


ie garding the two facts 4 and p 


ves and 72 coluee 1, 2,3 d 
ro th mn. i= 3+) yo, M an J=1 
Let out: «tor N set ¢ data arrané h row and J colu 2,3, 
Let us consider tion arta 
“aly, Let Yi be the observ Yi2 
respectively: i ‘ Yo - 
y, 12 12 
: a o Yr 12 Y Y, 2 e 
Y 12 ‘ : 
Yn 12 
: : : y : 
: : Yi m 
Yni Yt Yon li a i 
; i be split unto 
The total variation 10 Yj observations can P 
a. Variation between the rows. 
b. Variation between the columns. a -— 
erent variation of the r 
c. Variation within rows and columns commonly called the andom 
variable. - 
The first and second type of variation is due to assignable cause of assign and third type of variation 
is due to cause of chance. 
| The main objective of the ANOVA is to examine if there is significant difference between columns means. 
| Mathematical Model 
Yong = Upg + €pq 
= + (Up. — HW) + (Hg — W) + (Upg — Wp + 1) + eng 
= it +0, Ba Ypat Spas 
p=1,2,3,--,m; g=1, 2,3,--,n 
Where, 


Cy = Up. ~ = Effect due to p" row 
By = Wy— w= Effect due to g" column 


Yoq = Upg — Up. ~ 

tk be Hp. ~ Ug = Interaction effe 

4 pq = Error due to ¢ 

‘| . Here, as the one obs a 

interaction effect pq = 0 

PQ = . 

Now the model redy 
Assumptions 


Ces t = 
° Ya =U + O1+ 8 +e 
a" “pq 


Hop * Wy, =. Sly. = 


= tA th 
Ypq = Observation in the i" row andj column 


= General mean effect and is oj = — 
is b=ym 
given by ae ar “24 ) 


N=pxq 


h 
ct due to p™ row and q" column. 


ervation in ea 
ch cell the ; 4 
he Interaction effect can not be determined. Hence : 
etermined. 


Design of Experiment 
ich can be transformed to 0) = O) = 03 =---= 0, = 0, and 


Hy = B= yy (i.e. Population means of all column) 


Hig: At least one u;. is different. i= 1, 2, 3, .... m 


wh 


Hic: At least one u, is different j = 1, 2, 3, --, n. 
Worked Out of Examples 


jog] A company has four marketing executives MEl, ME2, ME3 and ME4. They work in three 
B cities Kathmandu, Bhaktapur and Patan. The table below shows the sales in ten thousand 


rupees per month. 


Marketing executive 


Carry out two way-ANOVA and interpret the results. 
solution: Let X, X2, X3 and X4 be sales of marketing executives ME1, ME, ME3 and ME, respectively. 
Again, let, X;, Xz and Xp be sales in three cities Kathmandu, Bhaktapur and Patan respectively. 
Marketing executives 
Kathmandu: Xx 
Bhaktapur: Xp 
Patan: Xp 


Total sum of square, SST = x, = CP: 
= (30)? + (70)" + ~ + (80) — 43200 
= 49400 — 43200 


= 6200 
Sum of square between columns (i-e., marketing executives) 


2 2 
oxy, our, eur, Car cp 
19) B| 


Sum =>, 
2 2 2 
(210) _, 180)" 05, i _ 43,200 = 600 


Sum of square between rows (1.€,- cities) 


2 
oxy? xe, 2X cr 
SSR = nk ng np 


2 
(160) , (240) a Gay _ 43,200 = 46,300 — 43,200 = 3200 
= 4 4 


Sum of square within sample (i.-¢., error) 


SEE = SST- SSC — SSR = 6200 — 600 — 3200 = 2400 


jation square, S 


Between columns 
rketing = 
paper MaRS e Wp 
Between rows 
(Cities) 
Within samples 
(Error) 


es due to marketing executives: 


: ¢., there is no significant difference in average sales 
. *9 
Ye] 


Test of significance difference in average Sal 
Null hypothesis, Ho: Hi = H2 = U3 = Ma 
four marketing executives. 
Alternative hypothesis, Ay: pi # He * M3 i.e., there 1s 
four marketing executives. 

Test statistic: Under Hp, the test statistic is 


significant difference in average sales } 
y 


Level of significance, & = 5% = 0.05 
Degree of freedom, df = (3, 6) 


Critical value: Fiab = Fos, G, 6) = 4.76 
Decision: Since Fy = 0.5 < Fray = 5.14, there is no reason to reject null hypothesis. Hence y; 
; e 


conclude that there is no significant difference in average sales between three cit; 
ies, 


Kathmandu, Bhaktapur and Patan. 


Example 7.10 | 
xample 7.10 | The following table gives the data on the performance of three different detergents at 
ents a 


Solution: (a) Nul! hypothesis Ho: rv 
20: M4 = Ug = 


three different water tem 
peratures. The performance was i 
: obtaine ‘whi " readi 
based on specially designed equipment for nine loads of washing ater saad 


—Datrgentd Deepen 
Cold water re .eterpent # 
Warm water 37 43 55 
Hot water 42 40 56 
— 44 
46 


Perform a two-wa i i 
y analysis 0 variance, using the level of signific 
ance @ = 0.05. 


diffe Hele. there is no signi 
sites ce of detergents, A: by # By a en teant difference in whiteness dm? 
S due to different types of estas: ¥ Uc Le., there is significant difference in 
S. 


Test statistic: Under iy: Fo= MSC 
(b) Null Bana 
ae Hypothesis, Ab: wy, = lb = 
erent water temperatures 43 # 0 there IS SO gj 


Alternatiy 
€ hypothesis (H 
temper 31.-€., th ial irene 
ature, > Mere is significant difference in while™ 


gnificant difference in whiteness due” 
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Test statistics: Under Ho: Fr = vo 


Water temperature 


Cold water (X;) 
Warm water (X2) 
Hot water (X3) 


LX: = 13 
T = 124+ 127+ 157 = 408 y= 132 


= 


2 
Correction factor = -t. mT Oey ane 


= 2 _ ii 
TSS = (45) = (43) +... + (44? + (46) -— = 18,820 — 18496 = 324 


SSC = Sum of square between columns (i.e., detergents) 


_(EXAY’ | (2X)? EXO? PF (1247 (1277 (157 
= ng ng Roe a fg 


= 222 
SSR = Sum of square between rows (i.e., water temperature) 


(2X, y (2X, > (xP r (143) (133) 132 2 
= ip hy Aeo — 18496 


= 24.67 
"SSE = Sum of square within samples (i.e., error) 

= TSS — SSC — SSR = 324-222 — 24.67 = TI33 
Two way ANOVA Table 


Between 
detergents 
(columns) 


Between water 
temperatures 
(Rows) 


Level of significance. 0 = 0.05 
F,= 5.74 
Degree of freedom: (2, 4) 
for (2, 4) af= 6 94 


of F is le 
ded that there is no sign 


ss than tabulated value of F, the null hypothesis 


Critical value: Fo.0s 
ificant difference in whiteness 


Decision: Since calculated value 
Hp is accepted. Hence we isis 
due to different types of detergents. 


Fp = 0.66 
df= (2, 4) 


alue = F'o.0s 
rce the calc 


n the tabulated value of F the 
> Null 


for (2, 4) = 6.94 
hat there is no significant diffe 


ulated value 
d. Therefore, 


hypothesis Ho 1s C0 perature 
vio different water ' : ield fo ieti 
due tod he following per hectare yl r 4 varieties of Whea 


get up two-way ANOVA table for t 
ee Treatments Yields 
feel oe 
I eae 


of F is less tha 


Critical V 
we conclude t 


Decision: 5! 


33 
54 


difference in average yield of wheat due to (i) four differen, 


f land. 
e., there is no significant difference in average yield ; 
0 


I 
Ill 
Test whether there is significant 
treatments and (ii) three different types © 


Solution: Null hypothesis, 4: Ua = Hs = Hc = Lp i. 
wheat due to four different treatments. 
Alternative hypothesis ;: [1 # Ug # Uc # Mp i.e., there is significant difference in avera 

ge 


yield of wheat due to four different treatments. 
Test statistic: Under Hy : Fo= MSC 
° 0-4£C MSE 


> ] 1 ’ 


Alternative hypothesi 
8, Ait Uy # Uy ¥ Uy i peer 
of wheat due to different fypes tle arene difference in the average yi id 
ge yie 


MSR 
MSE 


oes 
Treatments ields) 
2Xp = 164 


fs 
219 + 218 4.293 = ge 


Test statistic: Under Ho: Fr = 


n=]2 
Correction factor Fe 660)’ 


TSS = 
are 


(53)? 4 
5 2 
= 20 ” + (54) + ($7) 7 
SSR = 5, = te 29 
Sl Squar 
Se oy 2 © between 
F 
_ ae vie OS Bes of land 
a a nd) 


_ 219)" 218% (2232 Design of Experiment 269 
a 4 + 4 ~~36300=3.5 


SSC = Sum mt square between colours (i.e., treatment ) 
_ Xp XY (2x = e 
= ah, Ok Cte? Byer 

n 
_ 165)" , 164)" | (165 (166) 
3 3 , tT 3 ~ 36300 = 0.67 
SSE = Sum of square within samples (i-e., error) 


= TSS - SSR - SSC 


Two way ANOVA Table 


Source of 
variation 
Between samples 
(i.e., treatment) 
Within samples 
(error) 
Due to error 


d 
if MSS Test statistic 


_MSR_ 115 _ 
R= cK = 7,64 — 0-66 


Level of significance: a = 0.05 
a. Fc=0.08 
d.f= (3, 6) 
Critical value: Foos for (3, 6) 4.f = 4.76 
Decision Since calculated value of F is less than tabulated value of F, the null hypothesis 
Hy is accepted. Therefore, we conclude that there is no significant difference in average 


yield of wheat due to different treatments. 
b. Fr = 0.66 


df= (2,6) 
Critical value: 

Foos for (3, 6) af = 5.14 
Decision: Sine the calculated value of F is less 
accepted. Therefore, we conclude 
different types of land. 


than tabulated value, the null hypothesis Ho is 


that there is no significant difference in average yield of 


on production per day turned out by 4 


seeraaacaaaa ot same for the three different machine types. 


a. Test whether the 
b. Test whether 4 workmen differ wit 


mean productivity is the 
h respec 


t to mean productivity. 


a ae 


n=12 7a 129 + 138 + 114 + 117= 498 


2 
r = CY ~ 29667 


Correction factor = Pa 


ignificant difference in Mean 


; significant difference ip Mean 


no significant difference jn — 


r . 
TSS = (44) = (38) + + (38)? + (46) - a 21046 — 20667 = 379 


SSC = ty 


=201.5 
SSR = Sum of square between rows (i.e., worker) 
2 2 2 2 
~ Ay, Gy Ou ex) 2 
ny nN ny N4 n 
_ 129" (138 (4 (1472 
ORE BBP GF UU 
SSE = Sum of square within sa: 


mples (i.e., error) 
= TSS~ SSC ~ ssp = 379 


T 
Source of Wo way ANOVA Table 
_ a ieee 
oe 


2 2 2 
ox)’, OG), EX? F me (152) _ 089) ee 
Np Nc n 


~201.5~ 123 =545 


Test statistic 


pr — MSC _ 100.75 _ 441 
©” MSE™~ 9.08 


_MSR 41 459 
Rk” SSE ~ 9.08 


vel of significance: & = 0.05 Design of Experiment 271 


F.= 11.1, af. =(2, 6) 

Critical value: Foos for (2, 6) af. > 5.14 

Decision: Since the calculated value of F is 
; greater th 

hypothesis Ho 1s rejected i.e., alternative biriottests - tabulated value of F, the null 

conclude that there is significant different in me Mian 

ae an productivity of three types of 

b.  Fr= 4,52 df. = (3, 6) 


Critical value: F'o.0s for (3, 6) af. = 4.76 


Le 
a. 


Decision: Since the calculated value of F is less than tabulated value of th 
ue of the null 


hypothesis Hp is accepted. Therefore, w 
- : » we conclude te shes 
mean productivity due to different workmen. that there is significant difference in 


Frample 7.13 Prepare one way analysis of variance table and carry out the test for significant difference 


Solution: 


in the average yields between three different variance of seed. Given: 
Total sum of squares = 258 
Sum of square between varieties of seed = 50 
Total number of observations = 20 
Here, we have given 
Total sum o squares (TSS) = 258 
Sum of square between resistance of seeds i.e., samples (SSB) = 50 
Total number of observations (n) = 20 
Number of varieties of seeds = 3 
Now, SSE = TSS — SSB = 258 — 50 = 208 
Null hypothesis, Ho: pi = 2 = Hs i.e., there is no significant difference in average yield of three 
varieties of seeds. 
Alternative hypothesis, Hy: wh # Peo = Hs 
varieties of seeds. 


ie., there is significant in average yield of three 


; MSB 
Test statistic: Under Ho: F’ = MSE 
VA Table 


Two way ANO 
veer Ta [= | = 
variation 
Leal ical al 
208 
Within samples pare es MSR =47 = 12.24 


Level of significance: 0 = 0.05 
Degree of freedom: (2, 17) df=3.59 
Decision: Since calculated value of F 3 


accepted. Hence we conc 
varieties of seed. 


MSC _25__ 
Fo= "MSE 12.24 2.04 


the null hypothesis Ho is 


ificance difference in average yield of three 


s less than tabulated value, 


lude that there is no sign 


——————————————— LE 


d by four different workers a 
f 


mplishe b ren 
cco iE with two-fold objectives of sea ivg 
an productivity and whether the ining 


a 
to me : 
t nes. The researcher involved in this an 
Study 


h respec 
: i mach! 
ec 
the 
roductivity ! zs 
lg qe no ‘ tion between machines ~ 35.2 
f for varia ae 
Sum of squares “0 eee 
‘ sum of squares for variation between oo 
$ ae 
Total sum of squares for variation | L . | 
ota tion and draw the inference about variances ag 5 
| hy 


ven informa 


Set up ANOVA table for the gi 
level of significance. 
Solution: Here we have given, 
SSC sum of squares for variance between machines = 35 a 
etween workmen = 53.8 


SSR sum o squares for variance b 
TSS total sum of square for variance = 174.2 
Number of machine (i.e., number of column) = 5 


Number of workers (i.e., number of row) = 4 


Now, 


SSE = TSS - SSC - SSR = 174.2 - 35.2 — 53.8 = 85.2 
Null hypothesis, Ho: There is igni i 
peal aera 0 is no significant difference in mean productivity due to 


Alternative hypothesis, 4: 
different machines. 


Test statistics: Under H; we 
. 0: Fi c= 
MSE 


b. Null h ; 
5 ypothesis, Hy: There j oe 
dies wedcics o- “tere 1S no significant difference in m 
ean productivity due 
to 


Alternative hyp i 
: othesis A, : e i fi 
3 |: The igni can d ffe 
different workmen. a it difference in me 
an productivity due to 


Test statistic: Under H,: Fp= MSR 
~ MSE 


Source of Two ANOVA 
Due to 


machines 


Test statistic 


MSR 17.93 
F = 
MSE” 7.1 = 2.59 


Design of Experiment 


Fo= 1.24, Degree of freedom: (4, 12) 
Critical value: Fo.os for (4, 12) df= 3.26 


see te calculated value of F is less than tabulated val 
js accepted. ence, we conclude that there is no sj re the null hypothesis Ho 
productivity due to different machines. gnilicant difference in mean 


Fp ees? df = (3, 12) 
Critical value: Fo.os for (3, 12) df= 3.49. 
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Decision: Since the calculated value of F is less than the tabulated value of F 
ue of F, the null 


hypothesis Hy is accepted. Therefore, we conclud : 
i ? ed th ie : 
mean productivity due to different workmen. at there is no significant different in 


There are some missing values in the followin 
goose 15 g ANOVA t 
| different types of packaging. : able. The data relate to three 


—— 


Sum of 
squares 
(SS) 


Degrees of 


Mean sum of squares 
freedom (df) 


(MS) 


Source of variation 


F ratio 


Between samples 
Within samples 


Solution: 


ac Sum of squares | Degrees of | Mean sum of squares ; 


Between samples 40 14-12 =2 40 +2=20 20 + 12 = 1.67 


Within samples 184 —40 = 144 12 144+12=12 


Tabulated value of F(2, 12) = 3.89 at 5% level of significance. 


Decision: Feat < Frab, We accept Ho 


Example 7.16 | Consider the following ANOVA table, based on information obtained for three randomly 


selected samples from three independent populations, which are normally distributed with 


equal variances. 


Sum of Mean sum of squares 


squares 


Degrees of 
freedom (df) 


Source of variation 


Between samples 
Within samples 


Test whether three populations have same mean. 


Exercise 7.1 


1. What do you mean by design of experiment? neuetie 
; e 
2. Write the different terminology which are used in 
$ . 9 
3. What the principles of design of experiment: 


4. What do you understand by analysis of variance 


f experiment. 


ation? 


Agriculture 
Manufacturing 


Community and social service 


oar the 
Does the labor productivity index vary ae rence in the time period 
ie : 
F ; in the sector ii) ; ing values of variables: 
estate Je of 7 and 6 items respectively have the following 
| 2. Two independent sample o Weights in Ibs. 


Do the two estimates of population variance differ significance? [Ans. fea = 1.02; tay = 4.39] 

3. Two random samples drawn from population are: 
er 
Bempe [17 a3 98 96 as 


[Ans. fica) = 1.16, Ho accept] 
4. Test whether two populations have the sa 


me variance or not from the following: 
Sample I 


5. Two types of Pf random samp] 
es of 
characterized ag follows Pies of sizes 12 and 9, drawn from tw 
Popul. 


ation characteristics 


[Ans. 1.31; Ho is accepted] 


© normal populations, a 


Y found a y 


f aria 
Variance of 2000 a ibe nce of 2 


from a Sa 
of 1] Materials. Test one - na of materia 


© populations h 


[Ans, 1.33, Hy is accent 


a 
: ‘ ‘or and 
ls from first supplie? 
ave same variance. 


_ 3,02] 
[Ans, Fea =1]] 25: fab = Fo.os, (9. ) #3 


| 


a | 


Js there any SIS 
code yalue to $ 


Acertain compe” 
de x, out skirts of a city Y and shopping centre of city z. The sales in hundred rupee 
S 


y had four salesmen A, B, C and D of whom was sent for a month to three types of 


area country S! 
th are shown below: 


Seems 


Pan nee eee 
i) Istherea significant difference in the sales made by four salesmen? 
ii) Istherea significant difference in the sales made in different cities? 
9, An agriculture research organization wants to study the effect of four type of fertilizers at random in 
lots of land. Part of the calculation are shown below: 


6p 
Source of variation) Sum of square | Degree of freedom 
square 


Between 2940 
fertilizers < 
Within samples 
i) Fill in the blanks in the ANOVA table. 


ii) Test at 5% level of significance, whether fertilizers different significantly. 
y. The number of units of commodity '° 


Test of statistics 


10. For salesmen were posted in different areas by a compan 
sold by them are as follows. 
A: 20 23 28 29 
B: 25 32 30 - 
C: 23 28 35 18 
15 21 19 25 


D: 


9 
Is there a significant difference in the performance of these salesmen‘ 


in five matches. 


ll. The following data relates to the goals scored by three teams 4 
Team A: 2 5 : 5 1 
Team B: 4 2 2 1 = 
; 4 
ae 2 s to clean up oil spills. The followmg table 


'2. A research company has designed three different system” are meters) in cleared in one hour. 


Sonate the results, measured by how much s | trials. Are the three sys 
The data were found by testing each a 
effective? Use the 0.05 level of significance: 


tems equally 


De ————————— Ll le 


nts chosen 


stude s were fou 


Their score 


‘ ; ~ 
st was given (0 - 
est was 8 


13. Al " valley. 


of Kathmand 


3 z 


any significant difference between the Scores of 


f there 1s 


show 1 
Perform analysis of variance and 


students in the three campuses. 


Two Way Classification of ANOVA lesmen A, B, Cand D and observes their sales in three Seasons. 
esme. 9 


14, A garment company appoints ks ures (in lakhs) are given in the following table. 
summer, winter and monsoon. Salesmen 


zi 
es a 
Monsoon 28 


Carry out an analysis of variance and test whether there is any significant difference in the salesmen 
and in the seasons so far as sales are concerned. 


15. To study the performance of three detergents and three different water temperature, the foll 
whiteness reading were obtained with specially designed equipment. 


Detergent B 


ysis of variance using 5% | 


owing 


Perform a two Way anal 


= 190 
rae Total number of observations =15 
- Frepare two Way analysis i 
the mean Sales due to ores oe ae wee 


1) 3d “TTY Out the test for the significance difference " 
different Salespersons 


ii i 
) 4 different 8€0graphica] Tegions, wh 
» Where 


Total sum of eas 
tae aisle Owing information are given 
quare between Salespersons — 

UM Of square betwe — 


“1 regions = 49 


21. Prepare two way analysis of variance table and ca 


Multiple Choice Questions circle (O) t 
1. The ratio of between sample variance and wi 


2, Analysis of variance utilizes 


_, company had four salesmen A, B, C and D ich Design of Experiment 277 


country side x, out skirts of a city Y and shopp; 
month are shown below: 


€ 
jb og of area 
S per 


Is there a significant different in the sales made in different ities? 
cities / 


[Ans. (i) F = 0.5 ace . 

‘ -) accept Hp; (ii) F=4 

A garment company appoints four salesmen A, B, C and D and observes thei 05 (il) mk accept Ho] 

summer, winter and monsoon. The figures (in lakhs) are given in the levi ol in three seasons- 
able: 


same | a 
Se 
winter | ___28 53 

: 


Carry our an analysis of variance and test whether three is any significant difference in the salesmen 
and in the seasons so far as sales are concerned.[Ans. (i) F = 0.662, accept Hp (ii) F = 0.71, accept He] 
p) . > 0 


19. 


Salesmen 


20. To study the performance of three detergents and three different water temperatures, the following 


whiteness readings were obtained with specially designed equipment. 


Detergent C 
[Weer Texpeenre | Deere _[_?esgats —| 
ee nes aE ee reas nels ar eee 
eae: oer esau” Danes nee 

Perform a two way analysis of variance using 5% level of significance. 

[Ans. F = 8.4, accept Hy, F = 4.99, accept Ho] 


rry out the test for the significance difference in 


the mean sales due to 

i) 3 different sales persons 

ii) 4 different geographical regions, where followin 
Total sum of squares = 210 ‘ 
Sum of square between regions = 42 [Ans. (i) F=0.71 ois 


Exercise 7.3 


g information are given 


he correct answer. 


thin samp! 
(c) t-distribution 


e variance follows. 

i d) T-distribution 
(a) F-distribution (b) x7-distribution (d) 
d) t-test 
(@) Fetest (by 7C-test (c) Hest (a) 


. The hypothesis unde 


rst given by, 


(a) &,n- / 
| The idea of testing of hypothesis was Il . ‘ 
(a) R.A. Fisher (b) J. Neym 
t 1S 
r test i . 


(a) simple hypothesis s 


(c) null hypothesis 


6. The ANOVA test is of : 
(a) single degree of freedom (b) 
(c) zero degree of freedom (d) 
7, The mean sum of square due to error is given by, 
r SSE 
() = 0) (©) 
8. Which is not principle of design of experiment? 
(a) Replication (b) Randomization (Cc) 
9. The correlation factor in ANOVA is calculation by 
f ie 
@ 7 (b) (c) 
10. ANOVA is best on the test 
(a) z-test (b) test () 


duswer Key 


rr 


- 


(@) k-1n_, 


E.L. Lehman (d) A wald 


alternative hypothesis 
composite hypothesis 


double of degree of freedom 


three degree of freedom 


SSE=TSS—SSR (4) -MSS=TSS~- ssp 
| 
Local control (d) Experimental unit 
vie 
7, ~ SSE (d) MSS =TSS— ssp 
: | 
x2-test (d) f-test | 
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a oD 
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f data. 
| sts en all file ft 
| spss can op ormats that are commonly used €: scientific re Ng and analyz 
an) Scan i from MS Excel or Open Off; for structured ea . aie a i . 
plain text files (.txt or . ipa such as atabase 
| i) Stata and SAS. sa iii) Relational (S 
Wh SPSS QL) databases 
W en you use p) you work in 
om view, the Anft Output view, and Sia several windows: the d ‘ 
ave or refine Your queries. ript view. Eventually you will a the variable view, ti 
| Ul also use the 5 w, the 
yntax editor to 


| 


spss ip Statistical Computations om 
oratory Works 281 


J ; 
is | jtroduction 
|b “ae spss (Statistical Package for the Social Sci 
clences) 


| 

| 

| 

| 

| These data may come fi : 
| rom basic 18 soft 

| ally any soure Ware for edit 


Note. In this text, we will use IBM SPSS Statisti 
es 20. 


The data view: The data view di 
isplays your actual data and any new variab 
variables you have cr 
eated. 


The variable view: At the b 
: ott i 
view. The variable view window ae aes oe yeu sell nok 
I cs abel ee ins the definitions of each vari pee cane ie 
, type, , , alignment, and other information Nan DICE ot dae AEE 


1. Click the Variable View tab. 
2. Review the information in the rows for each variable 
3. Click the Data View tab to return to the data . 


a ane 


iam Sos Seateeiee 


Wal wouls you lire 10 007 


cs) & Open an esting data source i 
“Gate Gy, rae 2 NO 
aD | 
‘ll’ ORNS 


o ©) Create; 
q 


ae 


rep . 
& © Open anotner type of Me 


taal snow nly cistog in the future 
esults of your various 
e worked with 


lations. In SPSS, 


you see the r 
and charts. If you hav 
harts, data, and calcu 
ee your results. 


ndow is where 
statistical tests, 
rk on one page, © 
indow is where you s 


ie ia Window View: The output wi 
Excel, yo ch as frequency distributions, cross-tabs, 
each u are probably used to seeing all your wo 
indow handles a separate task. The output W 


8, : 
1.2 Creating a New Data Set 


In thi 
? eee we will create a new data se 
also create some automatic data entry co 


Proj 
Enter the following values in SPSS. 


t, define a set of variables, and then enter some data in the variables. 
: e the accuracy of our data entry: 


nstraints to improv 


ric, date, string, and binary. 


window. 


nb. 
_ Click the TyPe a View tab at the b ‘ w (referring to the name of the Variable 


2, Click the Va . ae Name co ine, on the th 
e cursor » Midvalue ; j 
3. With th he second row type: nen the Variable Type dialog box. 
Weight, on © ° lick the Numeric to OF 
pe column, © 
4. Inthe Type © 
5. Select (click) String. gente ee ; ——_| x 
nell pant cess Stanners Date tauren “o yumente Characters —— : 
leona cscs freeform neice __ONee © comma naracters: [5 ar 
co bate m © pot —J | 


a ~ ye 


oO goentine notation 
@ Date 
© Collar 
© Custom currency 
@® sting 
ith teading zeroa)> 

numeric (Integer wi 
© Restictad 
Pe abot the digit grouping setting, while tne Restidaeq 


aric ty! 
ne ise digit grouping 


ee Numeric never 


7. Click OK. 
Press tab or Enter three times to move to the label column. 


Type “The weight of 63 students of a school”. 
10. Click the Data View tab at the bottom of the window. 
11. Enter the values as follows. Then save file. 


gS vines 


2 Untitied2 [DataSet2} - BM SPSS Statistics Data Editor = 8 


Direc Marketing Grapns Ulities Add-ons Window Heil 


2 de 3) Gh NG) BS 


Name Type Width 


3 © 


al Vow Oats Transfer Anal Divec lance Graon Uiiie aaaan ueaese 


2 BES ht 


id row type: Frequency. ) type. 


: __ Decimals | Label qj 
sa Weight ‘Sinng ae eae 5: Hf ; 
Ene tacos Nemteg 7 fs weight of 63 studants of a school | }— Wan 5 a emia ‘stole: 3 of 3 varagies 
SO eaeey. There - 6 5, | pair ..Weight | Midvalue "Frequency x re 
= Medica leasmee th 20°30 25 ee 
= 2 7 _ i 
— 35 — i 
3 : {i 


45 


TH 


~y 


™ 


bill 
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4 Dis 


iJ 4| Box-plot, Pie-chart and Line Graph 
g.4 


proce ihe following values in SPSS and create a box-plot, a pie- 
gnter t 


Cars 
Trucks 
Motorcycles 
Buses 
Others 


chart and a line graph. 


Percenta pe 


(i) Box-plot 


jon: 
of ter the data as in Project 8.1. 


j. En 
2, Select Graphs => Legacy dialogues = Boxplot 


ovehictes.sav f Dataseth} - 0 ~ TBM SPSS Statistics Gata Ear a rer 


Sen SS Ss Sataled 


| RB Gnart Suider.. : 
#) S88 Geapnnoard Template Chooser_. = 


3. Click the Boxplot tab. Select Summaries of separate variables. 
4, Click OK. 


RQ Boxplot 


gba Simple 


Clustered 


Data in Chart Ape 
| © Summaries for groups of cases 
| @ Summaries of separate variables = 


Sas (dnecononn mst ne 


| 
| 
| 
| 


(ii) Pie Chart ; | 
1. Enter the data as in Project 8.1 sd 2 . => Value 
2 Select Graphs => Legacy dialog 
3. Click the Define 

4. Click the OK. 


of individual cases 


O sum 


(iii) Line Graph 

1. Select Graphs > 
2. Click the Define 

3. Click the OK. 


@ values ofindivdualca5e° 


Line > Values of individual cases 
=> 


Legacy dialogues 


w 
‘ gz [Ra troime vantenaetwteney dh 
8B Line Chars eae | 
W, | 
a sid 


| OM watt 
yo 


; att | Drop-une 


Data in Chan Ave ~ pia 


| © Summaries tor groups ofcases [| : 
© Summanes of separate yariables | > 
© Vatues of inawidual cases 


| ee 


Value Percentage 
a 


T T 7 T i 
Cars Truck Buses Motercycie One 


Percentage of types of vehicles on the roads of acity 


8.1.4 Use of SPSS in Descriptive Statistics 


Mean for Ungrouped Data 


Enter the following values in SPS 
Weight: 25, 35, 45, 55, 65, 75 


Solution: 


2. Click the Descriptives 


Repons 


Cortetate 
Regression 
Legtines, 


Neural Netwargs 
Classi 


~~ ia 
: Qimension Reoucs 
— | Seale 


Serena Grapne Waeee 


a a a 


=> Move 


; L 
= click the Option. Select Mean. aboratory Works 285 


pisstabte(sy. 
O Wide 


sincere 
chien 


{ Sane standareted values as variables 
os ee 


1 OK 4) 


SSS AS 


Statistic | Statistic | Statistic | Statistic | Statistic Statistic 


25 15 50.00 18.708 0.000 


mean, s.d., range etc. 


Mean for Ungrouped Data 


Enter the following values in SPSS and calculate 


225, 235, 245, 255, 265, 275 
Mean for grouped data 


Enter the following values in SPS 


ae 


1. Enter the data as in Project 8.1. 


2. Select Data => Weight Cases. 


3. Move Frequency into Frequency Vv 


S and mean median, mode, standard deviation, and percentiles: 
° + 


Solution: 


ariable. 


|| [gp mavatue 


4. Click OK. Select Analyze => De ae 
k the Frequeccies => Move Midvala 
5. Click the F 


if Gq weight 
| oe. ore cases 


= : tive statistics => Frequeccies. 
ariable(s). 


@ Weight cases by 


Frequency 


Current Status: Do not weight cases 


hh The weignt of 63 stud. 
& Frequency 


a ad 


SI We: 


ee 


4 


5. Click the Statistics. Select Mean. 
6. Click Continue, Click OK. 


| Frequency | Percent Valid {| Cumulative 
Percent Percent 


ae are See aa 4 


“gg 
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Mean for grouped data 
at ae following values in SPSS and mean, median, mode, standard deviation. d 
: : nae » and percentiles: 
pnter ht Midvalue ——. 
Were se Frequency 

35 4 

45 7 

a5 7 

: a 

75 5 


suse of SPSS in Probability Distributions 
ae <9] (Binomial Distribution) 
ising SPSS, find the binomial probability distribution of a binomial random variable X with n = 10 and 
me 068. Also, find (i) P(X = 8); (ii) P(X = 10); (iit) PLX'S 5); (iv) P< 6). 
a For binomial probability distribution function (pdf) 
sash Select Transform — Compute Variable 


2. Type in Target Variable. Select PDF & Noncentral PDF in Function group. 


; ' i in Numeric 
pdf Binom in Functions and special variables. Click on up arrow. Then in Nun 
. suet we will see PDF.BINOM(?,?,?). Put values PDF.BINOM(X,10,.65). Click OK. 


gue 

fabs pe be Io 
Sgr: © Cuveute vam. 

ce SAS Gp epee eens omen cases 


f 


y 


fists! Ss aj al state jee ao 


== ip 5°86 Ganaties Process OR ||. 
istributi ion (CDF) 
Similarly, for binomial cumulative distribution function ( 


iable 
1. Select Transform => ee aan ae 
j i ct : . up a . ! 
2. Type in Target Variable Se és a ll “ttabes cl ck = rN Sok. 
se icaaled sors ni OF BINOM(.?.?) Put values . 
Expression we will see . 


PDF in Function group. 


ion 
8.1.6 Use of SPSS in Correlation and Regress 


1 ,on x: 
Project 8.10| For Regression “on equation of) 


d the reg 
Enter the following values n SPSS and fin 


; ~~ ¢ and y. 
Solution: ¢ variables A 
1. Enter values of th ssion 
_ re: $s 
Aralyze aig y into Depends ( Gresser 


Linear. 


). Then, click OK. 


2. Select ends(s) 2 
ve Xinto Indep ; alate : ; ] 
3, Move ¢ — gale ; Be 
Je enn teers pe hehe ey (Patong) 
—*, : } os a | 
oy 5 aacgre pores MOOI) | be eA.) A 
pe aeeywetn spree  cigngleapent- on * ta | ee i ———— 
e me van pom yoni? ncaies Teme 3 if K gu j lea } oS: tig || 
fee L - | pons) | 
4 Kg nese , signe 202 i Independents) ‘es Sette | 
of ’ we | ax easy | 
ropes | w ia | 
f 3] 
4 ce? |e 
i ‘ geneen nes Bem i H 
P pn saa cls 
( comeragpo mer | ——— 
’ Memos Enter 
: ; é wanes : 7 | E = oto: 
| aa ‘ Coneatte + ftom Unenr woseing ea Selection Variable: 
6 t adel d bs | ad Case Labets = = 
por? tee | Wns eserten , i pene 
y | 


Now we get the following output 
Model Summary 
a ae Square Estimate 
.878 


i { oy] sal- as] 
a. Predictors: (Constant), X 
ANOVA’ 
[SumofSquares [af [Mean Square] F 


24.143 id 24.143 31.296 003° 
5 771 
6 


Coefficients? 
Unstandardized Coefficients 


Regression 


Residual 


Total 


a. Dependent Variable: Y 
b. Predictors: (Constant), X 


The fitted regression |j 


neisY¥=7.714 _ 0 
S 929 .X. 
Regression 


Laboratory Woe oy 


Se <E For Co 
oats : 
ye ~ the following values in SPSS and find the correlation b 
etween X an 
dy: 


calutio® 2 F 
4, Enter values of the variables Xand Y. 
, select Aralyze => Correlate => __ Bivariate 
3, Move Yand Y into Variable(s). Select Pearson. Then, click O 
» click OK. 


| 
is 


i > Coretaton Coemcents ——_—- 


Test of Significance——_——_—_—— 
_® Iwotaiie © Onotailed ae 


Then we will get 


Pearson Correlation 
Sig. (2-tailed) 


N 
Pearson Correlation 


Sig. (2-tailed) 


N 


Project 8.13] Correlation 


Enter the following values in SPSS and fin 


n X and Y: 


d the correlation betwee 


8 
2 Use of Microsoft Excel in St . 
ae readsheets are now available 3 
ne pba a epitome a i tical calculations. For example, some statistical 
vi used to carry out many 


orm other sta 
s to perf which can be 


1 are iIlustrated below, 
s in the text. 


co probabilities as well a 
unct; ; ; 

Nctions available in Microsoft Exce 
Probability calculations and to do many exercise 


of Probability se 
using Excel 


‘ particular store by the team " 
ies that have 


een identified ata 


8.2.1 Diagram and Graph 


i or 
Project 8 14| Six basic error categ 


Item Not Charged 
Bottom of Basket 
Double Charge 
Wrong Item Charged 
Item Hidden 

Coupon Error 


e-chart. 


istakes 


Express the following using pi . - 
a as shown in dialog box, (iii) Go to Insert " 


Solution. (i) Insert data in spreadsheet, (ii) Select dat 
(ii) Select Charts, (iv) Select Pie. 


eNO TANI Es 


ae a a ee Book1.x! 


$e i : — ree 
a home Insert Page Layout Formulas OS : 


baie ¢  & E Text Box & Signature tine - 
re : ‘) Sa Cy Header & Footer 94 Object it 


Area Scatter Other’ = Hyperlink ; 
* ~ Charts |: iH 4 woraart - §2 Symbot 


a 


— Reve 

Ea ees : ) §) Chip Act <& ‘ . . 

DS oi gp ox @ Sy diy 
Zp Shapes > ¢ . 

PwotTabie Table Picture 


‘Column tine Pie Bat 


E\ smartart 


Tet 


i Saneasoncscinleamnsmesacasen 


Mistake Type 


Se eee nee Cal BAN Do 2h acs ee: woe G H 
pe ke Type tsti‘él i Number of mistakes ay Tc 
2 fitem NoChaged 80 i Number of mistakes 
| 3_sBottom of Basket _ 2 i H 
4 jDouble Charge ‘ 
fs Wrong Item Charged - ai @ Item NotCharged 
6 ‘Item Hidden ‘ & Bottom of Bask 
eee ice ' et 
7_}Coupon Exror A : 
| 8 Total Ppa pes ed ! Double Charge 
4. en - @ Wrongttem Charged [7 
ei & ttem Hidden 
F 3 Coupon Error 


: | Average: 158 3333333. Coy 


Similarly, we can draw other diagrams and graphs 


8.2.2 Use of Excel in Descriptive Statistics 


Proj 
Consider example (2), F ind 


linear Correlation b 


etween x and y. 


Ese Aas 


: Click 
dialog box n insert function fr, (iii) Select a catego 


’ (iv ' 
| appear, Insert se 4 function: CORREL, (iv) Click ® : 
“Sn Array and t 


data in Spreadsheet, (ii) 


he correlation is 0.9772: 


y: 
‘, 


Bookl ———__ hor 
Microsoft Frees —ttory Works 291 
nse 


Page Layout Formutas Dat 
Jata 


Review 
vig ak \ | a View es @) 
an a0" ‘ 
J uw i8 joa a ‘ | A, Oo. ps 
passt€ of | a . ee) 3 * “| 
ee , Fant Styles | 
pinot? iy = lignment i ; | 
00S i el eat ; | 
per CORRAL * ~ (fe !| =CORREL(A2:A7,82:9) co) * | 
—— - ne 2:87) ‘ ; 
~ ee eS a an == sci | 
} oe x Y Function Arguments eT ee =e — 
ean oe rte eee ees tl ¥ 
| 33 CORREL ee ee A 
a | A rer . 
i 46 rrayl 2:47 
i ad | S Array2 82:87 Es) ™ (30;35;42-49- 
: ee 192; 48; 50;5 ty 
16: 50 . Seas a . (23;41;46;52:59;55) 
eturns correlati J aii 
7 on coefficient between two data sets. = 0977200272 
el Array1 
aa V1 is a cell ran 
BI ge of 
q 9. references that C2lues. The values should be numbers riemes. arr 
mumbers, ses '« BAYS, OF 
Formula result = 0.977200272 


Correlation 
Enter the following values in SPSS and find the correlation between X and Y: 


Project 8.17| Calculation of Average 
in SPSS and find the correlation between X and Y: 


Enter the following values 


(iii) Select a category: Most 


on insert function fi, 
n: AVERAGE, (iv) Click on 


Process: (i) Insert data in spreadsheet, (ii) Click 
, (iv) Select a functio 


Recently Used in the Insert Function dialog box ‘Yis 42.6667 
OK. Then the following dialog box will appear. The average of X 1s 44. ; : 
bs is San ar aS HEE BE our Microsoft eae =~ oe 
pa Insert Page Layout vies SE se Bop z 


Paste 


Function Argument 


Calcul 

Enter the following Va 
me | I 
aa 


in 


8.2.3 Use of Excel ; 
probability in Bino 


(a) Calculation of 
Process: (i) C lick on insert func 
box, (iv) Select a function: BINO. 


The function arguments for Binom 


i) Number_s is the number of su 


ii) Trials is the number i 
ill) 
iv) Cumulative is a logic 
for the probability mass 


al value: 
functio 


probability distri 


ccesses 
s the num 


Probability_s is the Probability of 


putions 
l pistribution 


a category: 


mia 


tion frs (iii) Select 
MDIST, (iv) Click on OK. Then th 
ribution are aS follows: 


ial dist 
in n trials. It is the 
ber of trials (7): 

successes (7). 


the the cumulative d 


for 
= x) use F ALSE. 


n P(X 


Statistical in the Inse 
rt Funct; 
tlon 


e following dialog box wy Uialoy 
4PPear 


value of X= 0, 1, 2, ...,7 


istributive function P(X < 
— x), use TR 
UE: 


eae DAREN INE PLAIN RELL 


[fe <Sinompist(5,18,-65,FALSE) 


Returns the Individual 


|) Hep or his fi 


term binomial distribution probability. 


‘Plcerr! : 
ber_s is the number of successes in trials. 


Formula result = 0.153570411 


10 
0.65 
FALSE 


oto ba 


0.153570411 


(Binomial Distribution) 


Using MS Excel, find i 
“find th ; sa des 
e binomial probability distribution of a binomial rand 
andom variable X with 
n=10 


d p —Mv. . = a ’ = 


Solution: 


F(x) = P(X S x) 


= 0.2485; (iv) PUK <6) = 0.248 


“Sy 


| ‘ robabili i 
Sa | “sh culation of p ty using Normal Distribut; Laborato 
(o . 4) Click on insert function f,, (iii ton Y Works 4 
cess: 1 : ; x (iii) Select a categ 93 
i . ‘ ory: * 
pox (iv) Select a function: NORMDIST, (iv) Click on HS a in the In 
- T Sert Function a; 
ape hen the following Fees dialog 
rhe function arguments for normal distribution are as fo| lalog box will 
: s S fo ; 
» Xis the value of the variable for which we want lows: 
; ; : nt to calcul 
fe the arithmet ate the a 
alog i) Mean ® eka of the normal distribution probability 
ear, iii) standard_dev is the Standard deviation of the n ; 
eere: : ormal distribut; 
iv) Cumulative 1s a logical value: for the cumulative di istribution. 
a1: ; e distributi ; 
bability m utive fu 
the probability mass function P(X’= x) use FALSE. netion P(X < x), use TRUE; for 
f Fonction » Arguments 
UE: = 2 @ 
Rebus the normal cumulative distribution for the ee 2s aca 
tive isa eg: q Returns normal tive : = 0.066807201 
mate ater te obey moe Ro oN ge ah ee ara RES, 
| Formula resuit = 0.01295176 Fn dae nee 
| Helo on this function 3 Formula result = 0.066807201 
Project 8.20 | (Normal Distribution) 
The average hei : 
nee a eight of students of a Campus is 165 cm and standard deviation is 10 cm. Using MS 
foie ind the probability of students whose height is (1) 150 om, (i) less than 150 cm, (ii) 172 em 
(iv) ess than 150 cm, (v) 180 cm, (vi) less than 150 cm , 
Solution: 
10 
0.012951760 0.066807201 
0.031225393 (.758036348 
0.012951760 0.933192799 


(ii) P(X $150) = 0.066807201, 
& 172) = 0.758036348, 


(i) P(X¥= 150) =0.01295176, 
3192799. 


(i) P(X= 172) = 0.031225393, il) PX 
(iv) P(X =180) =0.01295176, Gv) pLrsis0) = 0:93 
Note. Similarly, we can use MS Excel in other distributions. 


83 ; ; 

Use of R in Statistical Computations 

ossesses an extensive cata 
i regression, time seri 

Leaps computational task, 


but for heav 
ing transforming, discovering» 


| Ri j log of statistical 
e. oh i ater meme ie BP s statistical infere 
i machine learning algorithm, es, SFC Ott and 
| are st of the R libraries are written in F, 
Preferred. 

Data analysis with R js done in a series of steps; 


| and 
communicate the results. 


> 


modeling 


programm 


ming tool 


. ifically for data <.. 
M amm ed specifica Scie 
A Textbook of Probab and accessible pret jibraries ise analyze them ane 
. Risaclear of a collectio our hypothesis he right model for our data 
Program js made UP refine ¥ capture the rig 
Transform : . vestigate the yee of tools ee to a report with R. 
. ee nega Wide utpu 
meer eer ap a 
4] if 
icate : Integ 
Communica oh R 
. ted with ditor 
Getting oe the R console, and Re 
- Starting B, 
Data Set 


8.3.1 Creating a New 


indow. 
enone rner of the win 
. Click the R icon £ ak tab at the top left co f  weicanpaatiogene 
Solna Tile Vertically. Then we wil 
; . 5 
3. Click the Windows 


Peer Mc pocrges Wraows EP 
| Teer] [a] S] — 


| “Be Untitied - R Editor 


Ls ge R Console ~ 


R version 3.6.0 he 029-02 2S) =<. "Planting of 
Copyright (C) 2019 25S R Foundation for Stat 
Platform: i386-w64-—mingw32/i3986 (32-bit) : 
R is free software and comes with ABSOLUTE) 
You are welcome to redistribute it under ced 
Type ‘license()* or 'licence()' for distrin | 

| 


Natural language support but running in ar | 


R is a collaborative Project with many cont: 
Type ‘contributors()' for more information ; 
*citation()' on how to cite RorR Packages 


Type *demo()' for Some demos, *nelp()' for ¢ 


"help.start () * for an HTML browser interface 
Type 'q()' to quit R. 


(Basic Operations) 


TYPEX= (1,2,3, 4 in Red 


itor and find 
i) xt5, 


iii) x/5, 
+ “aM) V(exp(2) iL), 


ii) x ~| 4 . 

Vil) c=~ (x Iv) Hm 10, v) sqrt(x), vi) log(x), 
Vili 

Solution: ) sa 


1. Click the R icon if 


IX) xx y, X) mean(x), xi) varlt) 


as an arrow with a minus sigf 


Laboratory Works 


295 


tle => New script tab at the top left corner of the wi 
ndow. 


s => Tile Vertically. Then we will g 
get following 
g dialog box. 


in R Editor. Click after x and then click ed No 
i 72m NOW you will get output on R 


acl Aer x + 5 and then click al Now you wil 
| y ill get output on R Console 
[eq as 


[1] 6789. 


ane onR peak and es Enter key of Keyboard for the resul 
‘i result. 


You can type 


‘help()’ for on-lin¢ 7 


‘demo (}* 


for some demos, 


10g (%) 

o<- (xt sqrt (x)}/ (exp (2}+2) 
c 

oxty 

xty 


> «f5 

{1] 0.2 0.4 0.6 0.8 
>y <- x - 10 

> y 

,[1y -9 -8 -7 -6 
sqrt (x) 

4} 1.000000 1.41 
log (x) 

47 0. go0cco0 O. 6931472 1 0986123 1.3862944 . 


fy + ONE. cha) ALLER 


war (x} 
4214 1.732053 2.000000 


ees 


Project 8.22 | (Basic Operations) 
I d find 
ypex = {1, 2, 3,4, 5, 6, 7, 8, 9} in Reditor an . “19 sate 


jo xt5, il) xo! iii) 5, iv) ix) xxY 
vi) log(x), vii) c=-@t sqrt(x))/(exp) a), vi. Fe ? iid 
x) mean(x), xi) var(x). 
84 Use of Program Calculators 
; . i Iculators: 
In this book, the solutions are computed by using the following ca 
fx-991 EX 


i i ata 


2x 
2§ PLU: ' _ 
Use of Calculator fx-991 Es is . 
Clearing tents of All (srr) [Ac] 
a j tions: 
Decimal: P rform the following key opera 
Fixing pre 


ne lens ial CRA) 
For fx-991 ES: [3] Number Format) i] ( for 4 decimal Places) o4, 
For fx-991 EX: or [OFF] for fx-991 ES series 
(stat) 


4 for fx-991 EX series 
OF 
[MENUSETUP) 


for fx-991 ES ser; 
: ; [1-VAR] or and so on Tes 
Inputting Data: - and so on for fx-991 Ex Series 


Stat Editor: Perform the key operations to display the Stat Editor: 


(i) STAT)[2] ata) for fx-991 ES series 
for fx-991 EX series 


Obtaining Statistical Values from Input Data 
Perform the following key operations ; 
1. Sum: Xx, Ly, Lx, Er, Ee’, Lexy, Ex 
(STAT)[Sum] [1] to[8] for fx-991 ES series 
or[2-Var Cal] _ for fx-991 EX series 


2. Number of items: 7, Mean: x , y 


Population Standard deviation: 0; , 0, ; Population variance 0*X, o”Y 
Sample Standard deviation: Sx, Sy; Sample variance S*X, S’Y 
(STAT)[4] (Var) [1] to [7] for ES series 


3. Regression coefficients: A, B; Correlation Coefficient: r, 
Estimated Values: x , 


Li] (STAT) [5] (Reg) [1] to [5] for fx-991 ES series 
for fx-991 EX series 


NOTE. Please do not explain the key operations in final examination. 


8.4.2 Use of Calculator for Paired-Variable Data 
(Paired-Variable Data) 


The registrar at a small universi 
ty noted th: 
figures for the past 6 years (in hundreds of see 


xX, pre - enrollment 
, actual enrollment 


(a) Find linear correlation betwee 


nx and y. (b) Find the least-squares line ) =a + 5x. 


Laboratory Works 


jeast-squares line, predict the actual number 0 
lled if the pre- 


mf 
ing ea gure js 50 students. 


D 
a jen the content and off the frequency. 


| First cl 
yi i ES series 
+s: x 
Y 


f students enro 297 


$0 


for ; 
a 991 EX Series : 35 s 
st 48 50 
oa 3 2G) 8] ERIEROIS) ‘| : 
6 


ae El 22 9 se | 
agsits: For EX Series or 


For fx-ES Series perform the following Key operations: 

(Sum)[1] [E] Result: B= 11204 

(SHIFT) [=] Result: Ex = 256 

[E] Result: 5)? = 14096 

[=] Result: Ly = 286 

[=] Result: xy = 12608 

(vari) [E] Result: n = 6 

(Reg )[3] [=] Result: r= correlation coefficient = 0.9772 
[=] Result: 4 = 1.0934 (<a) 

[=] Result: B = 1.0916 (=4) 


Calculating estimated value ¥ : . 


50[ SHIFT] (Reg) [5] [E] Result : 9 = 55.67. 


The required least square line y = A + Bx is given as 
y = 1.0934 + 1.0916 x. 


Alternative, 
ae nExy—dxdy _ 6x 12608 — 256 x 286 21 0016 _= 4761, ae 2s 42.61. 
ye Se? — (Ex) 6 x 11294 — (256) 

y-y =b,(*-*%) > y — 47.67 = 1.0916 ( _ 42.67). 
A = 1.0914 + 1.0916 +. 


Therefore, Least-squares line is ) 


(c) If the pre-enrollment figure is 50 students 
= 1,093 + 1.0 
ents enrolled is 


a _ 
y =1.0914+ 1.0916 x 5667. 


Therefore, the actual number of stud 


Ip n 
(Paired-Variable Data) 


From the following data: 


298 AT extbook of. Probability 


- jable) ne nna 

Project 8.25] 8.25] (Single Varia | ee ey cD 
The following table 3 oe os ef 
ee ee ee Ts 


P13 | 10 | 5 
ransewe™ iol 
i dard devia - ariability relative to the value of me, 
n, varian’® sian the amount of variability 1 Mean, 


i) Find mea 


sasures 
ii) Compute a value that measu 


ncy Distribution 


r Freque 
sel m as follows: 


8.4.3 Use of Calcula 
For this type of distribution, enable the frequency eee 
Frequency ON/OFF: : 
[ON] for EX series 
[on] 


Example 8.26] (Frequency Distribution) 
of bulbs from two suppliers. He had the samples tested in 
a 


A purchasing agent obtained samples 
own laboratory for length of life with the following results. 


Length of life (hours) Sample from company A 


Sample from company B 


700 — 900 3 
900 - 1100 42 
1100 — 1300 
1300 — 1500 


(a) Which company's bulb gives a higher average life? 
(b) Which company's bulbs are more uniform? 


Solution: Calculation of average life and coefficient variation 


Length of life (hours) | Mid-value (x) | From company A (f) 
10 


700 — 900 


Sample from company B(f) 


(oS) 


900 — 1100 16 . 
1100 — 1300 26 : 
1300 — 1500 5 12 

3 


For company A: 


For fx-EX Series 
For fx-ES series calculators: (STAT)[1] (I-VAR 
~VAR) 


Note: Enter the mid-values 
mide 
lues in x-column in case of continuous frequency distribution 


800[=] 1000[=] 1200[=] 1400 [=] ®) S 

1o[=] 16[=) 26[=] s[=] | 2 

AC 2} 10 

[SHIFT] [1] (star) 3] (SUM)LT] EE] Result: 352 se 
* 4x" = 75520000 4 | 1400 


Oi) (SUM) Result: 2x = 66400 


Laboratory Work 
rks 


(var)L1] [=] Result: n = 60 
O) (Var)[2] [=| Result: ¥ = 
>X = 1106.67 


gE UaH Result: o, = 6,(x) = p 
ne = re) le r 
0) =] Gait dees Pulation s.d.= 184.97 
n(x) = Sample s.d. = 185.83 


: lator, we get,n= 60. Sfx = 
using calcu , Uf x = 6640 
0, Bfx" = 7520000 


average life of bulb of company Ais x = Xf _ 66400 
n- 60 = 1106.67 hours 


sample «.d_ of life of bulb of company 4 is 


Coefficient of variation of life of bulb of company 4 i 
y Ais 


O,-j\X 
cn) = Six 100% = 185.83 
(A) = Ax 100% = 7754-97 x100% = 16.79%, 


For company B: 


OD _. ER a 
_, [= = x 
0,-1(A) n-1l n= — = 75520000 40 (110 
saa ae 
=; . hours. 


uation 
x 
Please type the frequency of B instead of A 1 | 800 = 7“ 
. 2| 1000 | 42 
oe wise 3 too | 
1400 | 3 
AC 
(STAT)[3] (SUM)LI] [E] Result: Ex’ = 67080000 
(SUM) [2] [EE] Result: 2x = 63000 
[=] Result: n = 60 
[=] Result: x = 1050 
[=] Result: 0; = On(*) = Population s.4.= 124.50 
[=] Result: s,= On-1(«) = Sample s.d. = 125.55 
Using calculator, we get, 2 = 60, 2fx= 63000, 2% = 67080000 
2 63000 
Average life of bulb of company Ris * = a = 50 1050 hours 
Sample s.d. of life of bulb of company Bis 
Te nk 778000 _ G0. (63000) = 125.55 hows 
o,(B)=\fen1 at VE On! 60-1 
Coefficient of variation of life of bulb of company Bis 
195.50 = 11.96% 
CyB) = 2x 100% _ 155 100% 11.96% | 
( A's bulb gives a higher averae° we 
a) Here x¥,= 1106.67, x8 = 1050. SO; company pis bulbs are more uniform 
VB) = 11 96%, comp 


) cv = 16.79% > © 


of Probability and Statistics for B CA 


ncy pistribution) 
gth of life of el 


Example 8.27 (Freque 
The following table 
of life (hours) 


1800 — 2000 
eviation of the length o 


300 A Textbook 
ectric bulbs 


f life of electric bulbs 


Find the average and standard d 


Example 8.28 | (Frequency Distribution) 
ath of life of electric bulbs 


The following table gives the len 
Lents Fie) 
pany A Com 
200 — 400 10 — 
3 
16 

600 — 800 26 i 
800 — 1000 8 , . 

3 


a ‘ i 
(a) Which company's bulb gives a higher average life? 


(b) Which company's bulbs are more uniform? 


0k 


sca we 


pe 
gs poisson Di 


stribution Function 


1.000 
1.000 
1.000 


1.000 
1.000 
1.000 
1.000 
0.999 


0.999 
0.998 
0.998 
0.997 
0.996 


0.994 
0.992 
0.990 
0.987 
0.983 


0.975 
0.964 
0.951 


Statistical Table 
) 


tt 


: 
8 
9 
, aed 
1.000 
1.000 
1.000 
1.000 
0.999 1.000 
0.999 1.000 
0.999 1.000 
0.998 1.000 
0.997 0.999 1.000 
0.997 0.999 1.000 
0.995 0.999 1.000 
0.993 0.998 1.000 
0.988 0.997 0.999 1.000 
0.983 0.995 0.999 1.000 
on following page) 


( Continued 


(Continued on following pase) 


sy ee eesCé“‘ 


a 


rinued rere 


(Con 


| 


z score 
1.645 
2.575 


ive Area from the LEFT 


Area 
0.9500 
0.9950 


Nore: For values of z above 3.49, use 0.9999 a 
8 these common values that result from interp 


QSITIVE zScores, 
Pp Fi) 
0.02 0.03 0.04 
0.5040 0.5080 0.5120 0.5160 
af 05398 0543805478 0.5517 0.5557 
: 0.5793 —:0.5832,—(0.5871 0.5910 0.5948 
a3 0.6179 0.6217 0.6255 (0.6293. 0.6331 
od 0.6554 0.6591 0.6628 0.6664 0.6700 
05 0.6915 0.6950 0.6985 0.7019 0,754 
0.6 0.7257 0.7291 0.7324 0.7357. 0.7389 
07 0.7580 (0.7611 0.7642 0.7673 0.7704 
08 0.7881 0.7910 0.7939 0.7967 (0,795 
09 0.8159 0.8186 0.8212 0.8238 0.8264 
10 0.8413 0.8438 0.8461 0.8485 0.8508 
1 0.8643 0.8665 0.8686 0.8708 0.8729 
12 0.8849 0.8869 0.8888 0.8907 0.8925 
13 0.9032 0.9049 0.9066 0.9082 0.9099 
14 0.9192 0.9207 0.9222 0.9236 0.9251 
15 0.9332 0.9345 0.9357 0.9370 0.9382 
16 0.9452 0.9463 0.9474 0.9484 0.9495 4 
17 0.9554 0.9564 0.9573 0.9582 (0.9591 1 
18 0.9641 0.9649 0.9656 0.9664 0.9671 
19 0.9713 0.9719 0.9726 0.9732 0.9738 
2.0 0.9772 0.9778 +~—«0.9783 (0.9788 0.9793 
21 0.9821 0.9826 0.9830 —- 0.9834 (0.9838 
22 0.9861 0.9864 0.9868 ~—:0.9871-———(0.9875 
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 
2.4 0.9918 0.9920 0.9922 ~—0.9925 0.9927 
25 0.9938 0.9940 0.9941 0.9943 0.9945 
2.6 0.9953 0.9955 0.9956 0.9957 co 
27 0.9967 (0.9968 
n 0.9965 0.9966 oor? 0977 
i 0.9974 0.9975 0.9976 ; ne 
29 0.9981 0.9982 0.9982 _~—«0.9983 k 
9988 
3.0 9987 0.9988 0. 
i eae mE ay 
32 “b004—«0.9994—«0.9994 
" 0.9993 0.9993 0.99 0.9996 0.9996 
3 | 0.9995 0.9995 0.9995 g.gg97 
34 | 0.9907 0.9997 0.9997 0? 
35 
4.0 0.99997 
0.999997 
r the area. 


olation: 


ogi | z 
acl ro 
V2n ii err dt Fy 


0.05 
0.5199 
0.5596 
0.5987 
0.6368 
0.6736 


0.7088 
0.7422 
0.7734 
0.8023 
0.8289 


0.8531 
0.8749 
0.8944 
0.9115 
0.9265 


0.9394 
0.9505 
0.9599 
0.9678 
0.9744 


0.9798 
0.9842 
0.9878 
0.9906 
0.9929 


0.9946 
0.9960 
0.9970 
0.9978 
0.9984 


0.9989 
0.9992 
0.9994 
0.9996 
0.9997 


0.5239 
0.5636 
0.6026 
0.6406 
0.6772 


0.7123 
0.7454 
0.7764 
0.8051 
0.8315 


0.8554 
0.8770 
0.8962 
0.9131 
0.9279 


0.9406 
0.9515 
0.9608 


0.9686 


0.9750 


0.9803 
0.9846 
0.9881 
0.9909 
0.9931 


0.9948 
0.9961 
0.9971 
0.9979 
0.9985 


0.9989 
0.9992 
9.9994 
0.9996 
0.9997 


0.5279 
0.5675 
0.6064 
0.6443 
0.6808 


0.7157 
0.7486 
0.7794 
0.8078 
0.8340 


0.8577 
0.8790 
0.8980 
0.9147 
0.9292 


0.9418 
0.9525 
0.9616 
0.9693 
0.9756 


0.9808 
0.9850 
0.9884 
0.9911 
0.9932 


0.9949 
0.9962 
0.9972 
0.9979 
0.9985 


0.9989 
0.9992 
0.9995 
0.9996 
0.9997 


0.5319 
0.5714 
0.6103 
0.6480 
0.6844 


0.7190 
0.7517 
0.7823 
0.8106 
0.8365 


0.8599 
0.8810 
0.8997 
0.9162 
0.9306 


0.9429 
0.9535 
0.9625 
0.9699 
0.9761 


0.9812 
0.9854 
0.9887 
0.9913 
0.9934 


0.9951 


A 9.9963 


0.9973 
0.9980 
0.9986 


0.9990 
0.9993 
0.9995 
0.9996 
0.9997 


Confidence 
0.90 
0.95 
0.99 


Common Critte 


0.09 
0.5359 
0.5753 
9.614) 
0.6517 
0.6879 


0.7224 
0.7549 
0.7852 
0.8133 
0.8389 


0.8621 
0.8830 
0.9015 
0.9177 
0.9319 


0.9441 
0.9545 
0.9633 
0.9706 
0.9767 


0.9817 
0.9857 
0.9890 
0.9916 
0.9936 


0.9952 
0.9964 
0.9974 
0.9981 
0.9986 


0.9990 
0.9993 
0,9995 
0.9997 
0.9998 


al Values 
Critical 
1.045 
1.960 
2.575 
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Hable A-3 Values of fa 


= 0.01 «= 0.00833 a= 0.00625 


a=0.10 a=0.05 a= 0.025 


vy 
: 
| 2 1.886 2.920 
; 3 1638 2353 3.182 4.541 4.857 5,392 2 
5.84] 
a ee ee ee 3.747 3.961 4.315 4.604 ; 
2.015 2.571 3.365 3.534 3.810 4.032 : 
6 1.440 1. 
7) 141s (895 «2.365 (2.998 3128 ve sec 6 
: 1397 1860-2306 2.896 3.016 ee 3.499 7 
‘6 om 1.833 2.262 2.821 2.934 “i 06 3.355 8 
1812 2228 2764 2.870 All 3.250 9 
1 1.363 1,796 | ik Sars PI) 
12 ieee toe Sane 2.820 
13 ee, ee 2.780 2.891 3.106 | 1 
14 is ii | Se Doe 2.934 3.055 | 12 
15 1341 1.753 145 2.624 2.718 2.896 3.012 13 
2.131 2.602 2.694 2.846 2.977 | 14 
16 ia dag 2837 soRW |. 1s 
17 2.120 
; 1333 1.740 2.583 ve 
8 1330 1.73 0 2.567 2.813 29 
19 rene ae a 2.655 ne 921 16 
20 eee 1.729 2.093 2539 2.639 27 2.898 7 
1725 2.086 959 2.625 ais 2.878 | 18 
21 1323 14 oak 2613 2.759 2.878 19 
: 1301 yeh 2080 2.sig 744 2.845 | 20 
» ons 
24 1319 1.714 ; 074 2.508 2.602 4.939 
me 1318 1.7] 069 9599 2.59] : 2.831 | 2 
1316 1.79 2.064 2.495 2.589 .720 2.819 my) 
et 2.060 9 4gs 2.574 age 2.307 | 23 
27 ae 1.706 2.566 cies 2.797 24 
314 1.793 056 2.479 .692 2.787 25 


Ai 
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F Distribution (@= 0.05 in the right tail) 
[ae Numerator degrees of freedom (df) 


2 3 4 5 6 


7 8 
14s 199.50 215.71 224.58 230.16 233.99 236.77 aa RBA ae 
18.513 19.000 =—-19.164 19.247 19.296 19.330 ie . a si 

‘ ‘ 385 


10.128 9.5521 9.2766 9.1172 9.0135 8.9406 8.8867 8.8452 8193 
77086 6.9443 6.5914 6.3882 6.2561 6.1631 6.0942 6.0410 6.9988 
6.6079 5.7861 5.4095 5.1922 5.0503 4.9503 4.8759 4.8183 4.7795 


Ame wo 


6 | 5.9874 5.1433 4.7571 4.5337 4.3874 4.2839 «4.2067 4.1468 ~—-4.0990 
7 | 5.5914 4.7374 43468 4.1203 3.9715 3.8660 3.7870 3.7257 3.6767 
g | 5.3177 4.4590 4.0662 3.8379 = 3.6875 = 3.5806 =. 3.5005. 3.4381 = 3.3881 
g | 5.1174 4.2565 3.8625 3.6331 3.4817 3.3738 3.2927 3.2296 ~— 3.1789 
10 | 4.9646 4.1028 3.7083 3.4780 3.3258 3.2172 3.1355 3.0717 3.0204 


11 | 4.8443 3.9823 3.5874 3.3567 = 3.2039 3.0946 = 3.0123 2.9480 = 2.8962 
12 | 4.7472 3.8853 3.4903 3.2592 3.1059 2.9961 2.9134 2.8486 2.7964 
13. | 4.6672 3.8056 3.4105 3.1791 3.0254 2.9153 2.8321 2.7669 2.7144 
14 | 4.6001 3.7389 3.3439 3.1122 2.9582 2.8477 2.7642 2.6987 2.6458 
16 | 4.5431 3.6823 3.2874 3.0556 2.9013 2.7905 2.7066 2.6408 2.5876 


16 | 4.4940 3.6337 3.2389 3.0069 2.8524 2.7413 2.6572 2.5911 2.5377 
44513 3.5915 3.1968 2.9647. 2.8100 = 2.6987 2.6143 2.5480 2.4943 
18 | 4.4139 3.5546 3.1599 2.9277 2.7729 2.6613 2.5767 2.5102 2.4563 
19 | 4.3807 3.5219 3.1274 2.8951 2.7401 2.6283 2.5435 2.4768 2.4227 
20 | 4.3512 3.4928 3.0984 2.8661 2.7109 2.5990 2.5140 2.4471 2.3928 


Denominator degrees of freedom (dfr) 
3 


21 | 4.3248 3.4668 3.0725 2.8401 2.6848 2.5727 2.4876 2.4205 2.3660 
22 | 4.3009 3.4434 3.0491 2.8167 2.6613 2.5491 2.4638 2.3965 2.3419 
23 | 4.2793 3.4221 3.0280 2.7955 2.6400 2.5277 2.4422 2.3748 2.3201 
24 | 4.2597 3.4028 3.0088 2.7763 2.6207 2.5082 2.4226 2.3551 2.3002 
25 | 4.2417 3.3852 2.9912 2.7587 2.6030 2.4904 2.4047 2.3371 2.2821 


26 | 4.2252 3.3690 2.9752 2.7426 2.5868 2.4741 2.3883 2.3205 2.2655 
27} 42100 3.3541 2.9604 2.7278 ~—-2.5719 2.4591 2.3732 2.3053 2.2501 
28 | 41960 3.3404 2.9467 2.7141 2.5581 2.4453 2.3593 2.2913 2.2360 
29 | 4.1830 3.3277 2.9340 2.7014 2.5454 2.4324 2.3463 2.2783 2.2229 
30 | 4.1709 3.3158 2.9223 2.6896 2.5336 7.4205 2.3343 2.2662 2.2107 


2.3359 2.2490 2.1802 2.1240 
2.2541 2.1665 2.0970 2.0401 
2.1750 2.0868 2.0164 1.9588 
7.0986 2.0096 1.9384 1.8799 


40 | 4.0847 3.2317. 2.8387 2.6060 2.4495 
60 | 4.0012 3.1504 2.7581 2.5252 2.3683 
120} 3.9201 3.0718 2.6802 2.4472 2.2899 
3.8415 2.9957 2.6049 2.3719 2.2141 


252.20 
19.479 


ii : 
| eke = 19.413 19.429 19.446 es non facie sie er ® 
; ie 8.7446 8.7029 ite aay es ne ee oe - 
‘ 5.9644 5.9117 ein 4 4.5272 4.4957 4.4638 4.4314 4.3085 rd 
s | 4.7351 4.6777 461 : 
9999 3.9381 3.8742 3.8415 3.8082 3.7743 3.7398 3.7047 hie 
6 | 4 wai es 3.5107 3.4445 3.4105 3.3758 3.3404 3.3043 3.2674 - 
: ee 3.2839 3.2184 3.1503 3.1152 3.0794 3.0428 3.0053 2.9669 a, 
§ 2.9365 2.9005 2.8637 2.8259 2.7872 2.7475 ss 


2.6996 2.6609 2.6211 2.5801 2.5379 


2.5705 2.5309 2.4901 2.4480 a.4n4e 
| aisha eee 2.4663 2.4259 2.3842 2.3410 2.2969 

2.4589 2.4202 2.3803 2.3392 2.2966 2.2524 2.2964 
2.2664 = 2.2229 2.1778 24139) 
2.2468 = 2.2043 2.1601 2.1141 2.0658 


16 | 24935 2.4247 2.3522 2.2756 2.2354 2.1938 2.1507 2.1058 2.0589 2.0096 


17 | 2.4499 2.3807 2.3077 2.2304 2.1898 2.1477 2.1040 2.0584 2.0107 1.9604 
. sss sits 2.2686 2.1906 2.1497 2.107] 2.0629 2.0166 1.9681 9168 
ae on 2.2341 2.1555 2.1141 2.0712 2.0264 1.9795 1.9302 1.8780 

: 2.2033 2.1242 2.0825 2.0391 1.9938 1.9464 1.8963 18432 


Denominator degrees of freedom (df;) 
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1.9165 1.8657 1.8117 
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1.9838 1.9399 pees 1.8648 1.8128 1.7570 
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26 | 2.2197 91479 2.0716 
5; v3 20558 ne (464 1.9010 1.3533 8027 —-1.7488._—_—*1.6906 
29 | 2.1768 ined 2.041] 1.9586 1.9299 1.8842 1.836] we 1 ase 1.6717 
30 | 2.1646 oo 70275 | gaae 19147 1.8687 1.8203 7138 1.654! 
092] 2.0148 ne 1.9005 1.8543 ne 1.7689 ne ee 
40 | 2.0772 2.0035 18874 1 8409 1.7918 aie es 1.6223 

60 | 1.9926 jing thes 18389 | 959 si 
1.9105 ie 74801 709) “T4M4  1.6998 4.6373 1.5766 ee 


1.6 
491 1.5943 1.5343 ‘1.4673 

1.4290 

1.3180 


ETRE atues of Foo; 


2 = Degrees of Vv, = Degrees of Freedom for Numerator 


Freedom for 

Denominator pea a) ee 2 sea 2 | 38 | 20 | 25 | 9 [oo | |] | 
6,157] 6,209] 6,240 6,261] 6,287) 6,313 6,339 | 6,366 
99.43 | 99.45] 99.46] 99.57] 99.47 99.48 | 99.49] 99.50 


34.12 | 30.82 26.87 | 26.69} 26.58] 26.50] 26.41] 26.32} 26.22 26.13 
21.20 | 18.00 14.20} 14.02} 13.91] 13.84] 13.75] 13.65] 13.56| 13.46 
9.72] 9.55) 9.45) 9.38} 9.29] 9.20] 9.11] 9.02 


13.75 | 10.92 7.56| 7.40) 7.30] 7.23} 7.14] 7.06] 6.97] 6.88 
12.25) 9.55 6.31] 6.16] 6.06; 5.99} 5.91] 5.82] 5.74] 5.65 
11.26] 8.65 5.52] 5.36] 5.26} 5.20} 5.12] 5.03] 4.95] 4.86 

4.96; 4.81] 4.71] 4.65] 4.57] 4.48] 4.40] 4.3] 


10.56} 8.02 
10.04] 7.56 


4.31] 4.25} 4.17] 4.08] 4.00] 3.91 


4.56] 4.41 


9.65) 7.21 4.25| 4.10] 4.01] 3.94] 3.86] 3.78] 3.69] 3.60 
9.33) 6.93 4.01/ 3.86) 3.76] 3.70] 3.62] 3.54] 3.45] 3.36 
9.07 | 6.70 3.82] 3.66; 3.57} 3.51] 3.43] 3.34] 3.25} 3.17 

3.66; 3.51] 3.41] 3.35] 3.27] 3.18] 3.09] 3.00 


8.86| 6.51 
3.21] 3.13} 3.05; 2.96] 2.87 


8.68 | 6.36 


3.52] 3.37} 3.28 


3.10} 3.02} 2.93} 2.84] 2.75 
3.00} 2.92} 2.83} 2.75] 2.65 
2.92} 2.84) 2.75] 2.66] 2.57 
2.84; 2.76] 2.67] 2.58] 2.49 
2.78} 2.69) 2.61/ 2.52] 2.42 


3.16 
3.07 
2.98 
2.91 
2.84 


3.41] 3.26 
3.31] 3.16 
3.23} 3.08 
3.15} 3.00 
3.09] 2.94 


8.53 | 6.23 
8.40) 6.11 
8.29| 6.01 
8.18] 5.93 
8.10} 5.85 


2.72} 2.64} 2.55] 2.46; 2.36 


8.02] 5.78 3.03} 2.88} 2.79 
7.95 | 5.72 2.98} 2.83] 2.73] 2.67] 2.58] 2.50] 2.40] 2.31 
7.88 | 5.66 2.93] 2.78] 2.69] 2.62] 2.54] 2.45] 2.35] 2.26 
7.82| 5.61 2.89] 2.74} 2.64} 2.58; 2.49} 2.40] 2.31} 2.21 
i 2 22 nH 
7.77\ 5.57 2.85} 2.70] 2.60] 2.54] 2.45] 2.36] 2.27] 2.17 5 
=. 
7.56| 5.39 2.70] 2.55{ 2.45] 2.39] 2.30; 2.21] 2.11] 2.01 <S 
7.31) 5.18 2.52] 2.37] 2.27} 2.20} 2.11] 2.02] 1.92} 1.80 5 
7.08 | 4.98 2.35} 2.20] 2.10} 2.03} 1.947 1.84} 1.73] 1.60 x 
6.85 4.79 2.19) 2.03}; 1.93] 1.86} 1.76; 1.66] 1.53] 1.38 Sy 
6.63 461 2.04; 1.88] 1.77] 1.70; 1.59} 1.47} 1.32] 1.00 g 
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Full Marks: 60 
Bachelor in Computer appicane”™ ; Pass Marks: 4 
Course Title: Probability and Statistics Time: 3 hour 
Code No: CAST 202 
Semester: II* 
Gesitnee Symbol No: 
Candidates are required to answer the questions in their own words as far as possible. 

Group A 
Attempt all the questions. [10 x 1 =10 
1. Circle (O) the correct answer. 
i) How many types of data on the basis of sources of data collection? 
- a) 1 b) 2 c) 3 d) 4 
li) Which is more appropriate central tendency to find the average of profit? 
a) Arithmeticmean b) Median c) . Mode ai 2A 
tii) What is the range of Correlation? ) 
a) Otow b) amy 
ik c) -ltol d) Otol 


iv) : r= 0.2 then coefficient of determination implies that 
a) 20% of total variati 
on in dependent variable has been explained by independent variable. 


; a. 
) What is the minimum value of Probability? 


a) ] 
| b 
Vi) In case of Normal distriby a 


a) Mean > Median 


tion 


b) Mean = Median c) 


| 


ssion line of X on 
Y and Y on X are int 
ersect a 
at the 


vii) The repre 
0 ‘ 
on b) (a, 5) Point 
viii) case of systematic sampling ) (YY 
sample mean is biased estimator > 
a) ; Zi : estimator population me a (x, 7 
pb) samp e mean is unbiased estimator populati ean, 
c) sample mean can’t estimate cial ulation mean, 
d) sample mean may equal to anita mean. 
ix) Mean of Chi-Square distribution with n peo 
a) | b) 0 ees of freedom is 
x) How do you obtain degree of freedom in on Cc) 2n 
a) (k, n— 1) b) away ANOVA? da) n 
(k,n—k 
) S h-in- 1) 
da) (k-| 
»n= k) 
Attempt any SEX questions. Group B 


2. Describe scope and limitation of statistic 
: S. 
3. Determine average wages from following data 
ata: 


Paes” [S010 351 a5 al 


Fa [usin 
(in| a6 alos SOT 
ee 


a ee ee 
7. How : 
do you determine sample size in sampling? Explain briefly. 


8. Wri 
rite short notes on simple random sampling. 


[2 x 10 = 20] 


ime BCA program of a 
lass is positive factor in 


Group C 


me BCCA program the morning t 
les. If the homogeneity in age of the ¢ 
which of two groups will be easier to t¢ 


Att 
, empt any TWO questions. 
sale age in the regular dayti 
ie are described by two samp 
arming make suggestions, with reason, 


—— 


Morning BCA program ay 


A 
nd statistics for BC 7 
and ™ 
ty Age Number of Student, 

ae ee os i i0 
if . = 
| page ee 30 - 
i | 
| —— ; ie 

28 : 
33 ; 

m) 


i 
it 
| 


NlN] uw luolrwlw 
chet 
Sia 


10. Given a normal distribution with mean 200 and s.d. 20, find the probability sit 
(i) P(X> 180) (ii) P(X<220) Gh) FUGO<0< 280)" GH) PCS 220) 
(vy) PWX< 180 or ¥> 220) (vi) 10% of the values are less than what values of X? 


11. The labor productivity indexes of Nepal are recorded is below: 


Sector Year 


2015 


Agriculture 


Manufacturing 


Community and social service 


Does the labor productivity index vary due to the: 
(i) difference in the sector i 


Nuss (ii) difference in the time period? 


Group A = | x 10=10 Q-1 
Group B=6 x 5 = 39 


ca Q~2tog 
; Oup C=2x Jo 
i E20 @ 04a ds 
14 Practical = 40 
Attempt any SIX questions. Croup-B 


f . 
. 


/ 


Sea er ee oa Oe 


0) 


Less | Less | Less | Less | L life ti 
than than than om 
500 600 


: . : ‘ Im 

towing data shows the life time in hours ; POrtant Quesy 

‘ the follo of 400 tube lights, Finite no estion Sets 313 
' 8 


pescribe the organizational aspect of sampling survey, 


s Compute the coefficient of correlation from the following results obtaj 
, ained between two variables, 


of sets 


Arithmetic mean 


Summation of products of deviation of variables X and Y from this respective means is 46 

. - [Ans.r = 0.997] 
There is very high degree of positive correlation between two variables X & Y 
Seven coins are tossed and the number of heads noted. The experiment is repeated 128 times and the 


i 


No. of heads. - 


Frequencies 


Fit a Binomial distribution assuming the coins is unbiased 
Ans. Expected fre 


No. of heads (X) 


Expected frequencies 


quency distribution of heads and tails is 


caesEVeaeees 
1 

7. Discuss the quota sampling and systematic sampling. 
balls. Three balls are drawn randomly from a 


8 A bag contains 8 red, 4 white and 5 black coloured e ball 
bag. Find the probability that (i) all are red (ii) 2 is red and | white (iii) 2 are red and | other (iv) all 


_ ‘ , aes . 
a [Ans. (i) 0.082 (ii) 0.1647 (iii) 0.37058 (iv) 0.235] 
Group-C ee 
: . # 
ttempt any TWO questions. d in the case of hundred families gives the following 


9. Family income and its percentage spent on foo 
bivariate frequency distribution. 


_efficient. 


, ‘Oo 
Find regression O°" i di 
: nd 
‘Find regression equation: ee fi expe ey 
i - sate the income of @ family test the significa 
iii) Estmate jation co-efficient also 9.6x + 666 and x = ~ 0.02y +315 ili) Rg 463 
: ‘ rrela » = 7.0% eat , 
iv) Calculate co 6, 0.02 ii) J iv) r is significant because > 6p, 5 
Ey 


Ans. i) —9:° k 
..-thntion of 130 workers. Find out t , 
e daily wage distribution he TANGe op 


; ents th 
able given below repres ; 
10. The tabie e middle 60% workers. a = yp More More More | 
income oi More | More rns than 130) than 145 than 160 yn 
5 More ore 100) than 115 | than | than 160 | than 175 


Wage (Rs. pet 5 than 
a than 70 _ than 8 

week) a 109 719 44 

_ Px = 102.50 — 145 and Range = 145 — 102.50 = Rs.42.5 


No. of workers 130 


[Ans. The range of income Pg 
umber 0 


11. The following data represents the num' 
workmen using different types of machines. 
Machine Type 
ee aes cee ee 


Dien Oe eae 
44 38 ee 
= 54 
44 
46 


a Test whether the mean productivity is the same for the three different machine types 


b. Test whether 4 workmen differ with respect to mean productivity, 
[Ans. a. F.=11.1 at df =(2, 6). Critical value: Fyo5 for (2, 6) df =5 14] 
net f=5, 
: Spesil i ennai . Fis Aste than the tabulated value of F. the null hypothesis 
ss OS TRS 5 ypothesis H, is a : 
ieaias different in mean productivity of bees a Seis ee 
if Fr = ai af = (3, 6). Critical value: Fy 95 for 3,6)df=4 - 
pia; Since the calculated value of F is less th ek Seay 
“pled. Therefore, we conclude that there is gj er tabulated value of the null hypothesis Ay is 
Smilicant difference in mean productivity due (0 


fants of production per day turned out by 4 differen, 


5] 
nt 


earson’s correlation coefficient ¢ Ps 
e Karl P Coefficient form the fol} ————! portany Ou 
ee ’ _ eat 
719 9 6 Pe ‘ollowii ng data ———_£Mestion Sets 315 


108 


ar us; A 
<1 USing following is renee) 
18 data, 


Ae Of car is 22 ye 


2000 | 
[Ans. Rs. 2680.90] 


4, Describe the sampling error and non-sampling error 
8, Write a short note on stratified sampling. 
Group-C 
Attempt any TWO questions. 
9, A sample of 60 cars of two makes P and Q i [2 « 10 = 20] 
recorded as follows: Q 1s taken and their average running life in years is 


Which make shows greater consistency in performance and why? [Ans. Make P] 


10. Let X be a normally distributed variable with mean (14) = 30 and S.D. (a) = 4. Then compute the 
following probability. 


a. P(X <34) b. P(X> 28) c. P(26<X>34) d P(x>36) © P(X< 22) 


[Ans. 0.841, 0.6915, 0.6827, 0.0668, 0.022] 
men in three months May, June and July: 


Il. The following table gives the number of refrigerators sold by 4 sales 


e four salesmen? 


de by th 


(i) Is there a significant difference in the sales ma pee ae 
ii . duri ere : ; 
Aeeeiie gee qtr a ae , lesmen, it is Not significant and Hy 1s 

. n for Sales 9 
= @ Fat <P ‘our salesmen 
os(3 6) = 4.76, Since fcal ™ he cales made by four 
: e in the sales | ahead 


[Ans. (i) Fy = 1.02, Tabulated Foos(3,6) = 4.76, 50 
accepted which means that there is no significant difference 
(iii) F,,, = 3.33, Tabulated Fo.0s(2.6) = 5,14, Since ane 
Hy is accepted which means that there is no SIB 
different months] 


it is not sign! 


the sales made during 


< Fup, for the month, 
ant difference '" 


-B 

u 

Group 7 ; 
primary data sourse 0) 
5 whose average wage is Rs. 45, Cola 


data col lection by 
s of 30 person 


y SIX questions. 


t an 
Attemp — 


i thod 
2. Mention the me 
3, The data given below represen 


the missing frequencies: 
Pavagesinks, | 2020 __| 


No. of persons 


oefficient and test the significance of the following data 


arson’s correlation C - 
so rns [96 | [| 9 2 
leebaee. Significant 


on y from the given regression equation. Also find 


5, Identify the regression equation of y on x and x 
the correlation coefficient. 


50x + 25y = 582 and 87y + 100x = 1913 
[Ans. First equation is y on x and second equation is x on y since sign of both regression coefficients 


is same and their product is less than unity. Also, correlation coefficient = —0.812] 
6. List out the characteristics of Normal Distribution. 
7. Differentiate the Stratified and Cluster Sampling 
8. Write a short note on census survey and sampling survey. 
Group-C 
Attempt any TWO questions. 
[2 x 10 =20} 


11. ; 
The Government Accounting Off [Ans. i) Rs. 1385.60. ii) Rs. 1724.80] 


ce (GAO) is intere - 
ment. Monty 'N seeing if similar sized offices spend 


Agriculture Office 
District Office 


Ca a ce 
A 
a ae a 


Significance =7.2] eve F signifi = 0.60 
: 8 Cance = 
shies 0, Tabulated Figs(2,11) af. for 1% eve : 


and H, ~ 
tab, at 1% leve] 
: 0 ificance for (2, 11) dJf, it is not signific#! 


His teins, 5 2° signi 
€ gnifi ; 
J€cted.] Cant difference in the average expenditur? 1 


Bk 


